Effective Altruism Forum
EA Forum

Discovering Language Model Behaviors with Model-Written Evaluations

Dec 20 20221 min read 0

25

AI safetyAI alignmentAI riskResearch summary

25

0

0

Reactions

0

0

Comments

No comments on this post yet.

Be the first to respond.

More from evhub

Curated and popular this week

Relevant opportunities