This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
Effective Altruism Forum
Topics
EA Forum
Login
Sign up
AI interpretability
•
Applied to
Takes on "Alignment Faking in Large Language Models"
3d
ago
•
Applied to
LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.
1mo
ago
•
Applied to
Motivation control
2mo
ago
•
Applied to
A Rocket–Interpretability Analogy
2mo
ago
•
Applied to
5 ways to improve CoT faithfulness
2mo
ago
•
Applied to
Solving adversarial attacks in computer vision as a baby version of general AI alignment
4mo
ago
•
Applied to
AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0
6mo
ago
•
Applied to
Rational Animations' intro to mechanistic interpretability
6mo
ago
•
Applied to
ML4Good Brasil - Applications Open
8mo
ago
•
Applied to
A Selection of Randomly Selected SAE Features
9mo
ago
•
Applied to
AI alignment as a translation problem
11mo
ago
•
Applied to
ML4Good UK - Applications Open
1y
ago
•
Applied to
Assessment of AI safety agendas: think about the downside risk
1y
ago
•
Applied to
Public Call for Interest in Mathematical Alignment
1y
ago
•
Applied to
AI Alignment Research Engineer Accelerator (ARENA): call for applicants
1y
ago
•
Applied to
Announcing Timaeus
1y
ago
•
Applied to
Don't Dismiss Simple Alignment Approaches
1y
ago
•
Applied to
Safety-First Agents/Architectures Are a Promising Path to Safe AGI
1y
ago
•
Applied to
Concrete open problems in mechanistic interpretability: a technical overview
1y
ago