Hide table of contents

Open Philanthropy is launching a big new Request for Proposals for technical AI safety research, with plans to fund roughly $40M in grants over the next 5 months, and available funding for substantially more depending on application quality. 

Applications start with a simple 300 word expression of interest and are open until April 15, 2025.

Apply now

Overview

We're seeking proposals across 21 different research areas, organized into five broad categories:

  1. Adversarial Machine Learning
    • *Jailbreaks and unintentional misalignment
    • *Control evaluations
    • *Backdoors and other alignment stress tests
    • *Alternatives to adversarial training
    • Robust unlearning
  2. Exploring sophisticated misbehavior of LLMs
    • *Experiments on alignment faking
    • *Encoded reasoning in CoT and inter-model communication
    • Black-box LLM psychology
    • Evaluating whether models can hide dangerous behaviors
    • Reward hacking of human oversight
  3. Model transparency
    • Applications of white-box techniques
    • Activation monitoring
    • Finding feature representations
    • Toy models for interpretability
    • Externalizing reasoning
    • Interpretability benchmarks
    • More transparent architectures
  4. Trust from first principles
    • White-box estimation of rare misbehavior
    • Theoretical study of inductive biases
  5. Alternative approaches to mitigating AI risks
    • Conceptual clarity about risks from powerful AI
    • New moonshots for aligning superintelligence

We’re willing to make a range of types of grants including:

  • Research expenses (compute, APIs, etc.)
  • Discrete research projects (typically lasting 6-24 months)
  • Academic start-up packages
  • Support for existing nonprofits
  • Funding to start new research organizations or new teams at existing organizations.

The full RFP provides much more detail on each research area, including eligibility criteria, example projects, and nice-to-haves. 

Read more

We want the bar to be low for submitting expressions of interest: even if you're unsure whether your project fits perfectly, we encourage you to submit an EOI. This RFP is partly an experiment to understand the demand for funding in AI safety research.

Please email aisafety@openphilanthropy.org with questions, or just submit an EOI.

95

0
0

Reactions

0
0
Comments3
Sorted by Click to highlight new comments since:

I'm new to applying for an AIS grant, so I have some common questions that might have been answered elsewhere:

(1) what are some failure modes that I might need to consider when writing a proposal, specifically for a research project?

(2) will research expenses include stipends for the researchers?

(3) can I write a grant to do a research project with my university AI safety group? I'm not sure if this will be considered a field-building or a technical AI safety grant.

Some common failure modes:

  • Not reading the eligibility criteria
  • Not clearly distinguishing your project from prior work on the topic you're interested in
  • Not demonstrating a good understanding of prior work (would be good to read some/all of the papers we link to in this doc for whatever section you're applying within)
  • Not demonstrating that you/your team has prior experience doing ML projects. If you don't have such experience, then it's good to work with/be mentored by someone who does. 

"Research expeneses" does not include stipends, but you can apply for a project grant, which does.

If you're looking for money to spend on ML experiments or to pay people who are spending their time doing ML research, then that may fall within this RFP. If you're looking for money to do other things (e.g. reading groups, events, etc), then that may fall under the capacity-building team's RFPs.

Has Open Phil (or others) conducted a comprehensive analysis for both understanding and building the AI safety field? 

If yes, could you share some leads to add to my research? 

If not, would Open Phil consider funding such work? (either under the above or other funds)

Here is a recent example: Introducing SyDFAIS: A Systemic Design Framework for AI Safety Field-Building

Curated and popular this week
Relevant opportunities