Effective Altruism Forum
EA Forum

Hide table of contents

[ Question ]

Has Anthropic already made the externally legible commitments that it planned to make?

Mar 121 min read1 answer 0

21

AI safetyAI governance

Has Anthropic already made the externally legible commitments that it planned to make?

Yes, I presume this is referring to their Responsible Scaling Policy

A year ago (2023-03-08) Anthropic published an announcement that included the following:

In the near future, we also plan to make externally legible commitments to only develop models beyond a certain capability threshold if safety standards can be met, and to allow an independent, external organization to evaluate both our model’s capabilities and safety.

Has Anthropic made such externally legible commitments so far?

21

2

0

Reactions

2

0

New Answer

New Comment

1 Answers sorted by
Top

Mar 12, 2024

12

3

0

Yes, I presume this is referring to their Responsible Scaling Policy

0

0

Thanks!

Follow up questions to anyone who may know:

Is METR (formerly ARC Evals) meant to be the "independent, external organization" that is allowed to evaluate the capabilities and safety of Anthropic's models? As of 2023-12-04 METR was spinning off from the Alignment Research Center (ARC) into their own standalone nonprofit 501(c)(3) organization, according to their website. Who is on METR's board of directors?

Note: OpenPhil seemingly recommended a total of $1,515,000 to ARC in 2022. Holden Karnofsky (co-founder and co-CEO of OpenPhil at the time, and currently a board member) is married to Daniela Amodei (co-founder of Anthropic and sibling of the CEO of Anthropic Dario Amodei) according to Wikipedia.

HabrykaMar 1310

2

3

2

(Just for the record, I don't think METR would be accurately described as an independent organization, but also I don't see any other candidate organization that is better placed. But in as much as Anthropic promised it would find an independent organization, METR, in my opinion, does not qualify)

More from Ofer

Curated and popular this week

Relevant opportunities