This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
Effective Altruism Forum
Topics
EA Forum
Login
Sign up
AI interpretability
•
Applied to
Rational Animations' intro to mechanistic interpretability
2d
ago
•
Applied to
ML4Good Brasil - Applications Open
1mo
ago
•
Applied to
A Selection of Randomly Selected SAE Features
3mo
ago
•
Applied to
AI alignment as a translation problem
4mo
ago
•
Applied to
ML4Good UK - Applications Open
5mo
ago
•
Applied to
Assessment of AI safety agendas: think about the downside risk
6mo
ago
•
Applied to
Public Call for Interest in Mathematical Alignment
7mo
ago
•
Applied to
AI Alignment Research Engineer Accelerator (ARENA): call for applicants
7mo
ago
•
Applied to
Announcing Timaeus
8mo
ago
•
Applied to
Don't Dismiss Simple Alignment Approaches
8mo
ago
•
Applied to
Safety-First Agents/Architectures Are a Promising Path to Safe AGI
10mo
ago
•
Applied to
Concrete open problems in mechanistic interpretability: a technical overview
1y
ago
•
Applied to
Announcing Apollo Research
1y
ago
•
Applied to
Why and When Interpretability Work is Dangerous
1y
ago
•
Applied to
Call for Pythia-style foundation model suite for alignment research
1y
ago
•
Applied to
High-level hopes for AI alignment
1y
ago
•
Applied to
PhD Position: AI Interpretability in Berlin, Germany
1y
ago
•
Applied to
If interpretability research goes well, it may get dangerous
1y
ago