An Aggressive Plan to Solve AI Alignment on Short-Timelines

Sarah Cheng (CEA)

This is a Draft Amnesty Week draft. It may not be polished, up to my usual standards, fully thought through, or fully fact-checked. It was also written before FTX blew up.

Regarding criticism, fire away.

While I have long intellectually believed that short AI-timelines were possible, it wasn't until I interacted with ChatGPT that this belief truly hit home. While its capabilities may not surpass those of PALM, being able to engage with it made these capacities feel much more tangible.

I am growing increasingly concerned that our plans are not progressing at a sufficient pace. Therefore, I wanted to paint a picture of what an extremely aggressive approach to tackling alignment might entail. Rather than focusing on what any particular actor could do, I am imagining what could happen if a large number of individuals were motivated to take do what is necessary.

The Plan

Elements:

We need as many talented people as possible shifting their career towards alignment if possible:
- More options than people realise:
  - Technical:
    - Research engineer vs. research scientists
    - "Buying time"/demonstrations
  - Governance:
    - AI ethicist positions are underrated, though working directly on x-risk is likely more impactful
  - Field-Building
  - Ops
  - Communication
- Likely worth trying if you're unsure
- Backup options:
  - Skill up and try again later
  - Volunteering on the side:
    - Possibly taking a less strenuous job or working 4 days per week
    - Directly using skills or helping to train people up
  - Performing a "tour of service" where you take a few months break from your career to volunteer
Field-Building:
- Scope:
  - AI safety-specific field-building is growing
  - Long-termist and existential risk field-building
  - Effective altruism field-building
- Minimal organising:
  - Dinner or drinks; perhaps with a guest or a lightning talk
- Light-weight events:
  - Unconferences, knowledge-sharing circle, self-facilitated discussion groups
- Tentpole events:
  - Fundamentals Course, retreats, bootcamp, fellowships
Stampy
Someone needs to solve the high-cost of living problem

Limits of the "Effectiveness" Frame

I'm starting to suspect that one of the most significant challenges EA will face in addressing this issue is the desire for guarantees that individuals are doing something effective. EA emerged as a reaction to charities that sounded good in theory, but extremely ineffective in practise. Despite EA's growing focus on speculative, long-term causes, this desire for certainty remains ingrained in our DNA.

Unfortunately, life doesn't always offer you certainty. Whether it's considering moving to a new city, starting a relationship or changing careers often all you can do is take your best guess whilst knowing it's still a shot in the dark.

Further, if we are truly operating within a scenario with short timelines, it is reasonable to expect that we will often have to take actions that are far from optimal, similar to how one might hastily grab whatever is within reach in the face of an impending fire and swiftly evacuate the area. It should be noted that this is not a justification for reckless behavior - needlessly leaping over furniture in a rush to escape is likely to result in injury and worsen the situation - but rather an acknowledgment of the limitations and constraints imposed by such scenarios.

That said, you shouldn't just adhere to the first course of action you choose to undertake. For most people, I recommend picking a robustly good action (ie. no significant downside risks) to pursue whilst also deepening your understanding of the strategic landscape and alignment problem. This improved theoretical understanding should then be combined with any practical lessons learned to inform a new course of action.

An additional caveat: I suspect at least a few people should go intentionally slow when deciding to take action, although likely not too many. I would only recommend pursuing this strategy if you have high agency, the ability to maintain focus on a task over a long time, excellent judgment, lots of relevant skills and you don't have the tendency to get caught up in the trap of endless contemplation. If you lack these qualities, I suspect an iterative strategy will work better for you.

Ollie Etherington 🔹Apr 9 20242

I am growing increasingly concerned that our plans are not progressing at a sufficient pace. Therefore, I wanted to paint a picture of what an extremely aggressive approach to tackling alignment might entail. Rather than focusing on what any particular actor could do, I am imagining what could happen if a large number of individuals were motivated to take do what is ne

uygniuygniuygnugnukngiuygnyu

Effective Altruism Forum
EA Forum

An Aggressive Plan to Solve AI Alignment on Short-Timelines

1

The Plan

1

Reactions

More posts like this