The Effective Altruism Foundation is launching the EAF Fund (a.k.a CLR Fund), a new fund focused on reducing s-risks. In this post we want to outline its mission, likely priority areas, and fund management structure. We also explain when it makes sense to donate to this fund.
Summary
- The fund’s mission is to address the worst s-risks from artificial intelligence.
- Priority areas for grants will likely be decision theory and bargaining, AI alignment and fail-safe architectures, macrostrategy research, and AI governance. There is some chance we might also make grants related to social science research on conflicts and moral circle expansion.
- Fund managers Lukas Gloor, Brian Tomasik, and Jonas Vollmer will make grants with a simple majority vote.
- The current balance is $68,638 (as of November 27), and we expect to be able to allocate $400k–$1.5M during the first year. We will likely try different mechanisms for proactively enabling the kind of research we’d like to see, e.g. requests for proposals, prizes, teaching buy-outs, and scholarships.
- You should give to this fund if you prioritize improving the quality of the long-term future, especially with regards to reducing s-risks from AI. You can donate to this fund via the Effective Altruism Foundation (donors from Germany, Switzerland, the Netherlands) or the EA Funds Platform (donors from the US or the UK).
Mission
The fund’s focus is on improving the quality of the long-term future by supporting efforts to reduce the worst s-risks from advanced artificial intelligence. (edited for clarity; see comment section)
Priority areas
Based on this mission, we have identified the following priority areas, which may shift as we learn more.
Tier 1
- Decision theory. It’s plausible that outcomes of multipolar AI scenarios are to some degree shaped by the decision theories of the AI systems involved. We want to contribute to a higher likelihood of cooperative outcomes since conflicts are a plausible contender for creating large amounts of disvalue.
- AI alignment and fail-safe architectures. Some AI failure modes are worse than others. We aim to differentially support alignment approaches where the risks are lowest. Work that ensures comparatively benign outcomes in the case of failure is particularly valuable from our perspective. Surrogate goals are one such example.
- Macrostrategy research. There are many unresolved questions about how to improve the quality of the long-term future. Additional research could unearth new crucial considerations which would change our prioritization.
- AI governance. The norms and rules governing the development of AI systems will shape the strategic and technical outcome. Establishing cooperative and prudential norms in the relevant research communities could be a way to avoid bad outcomes.
Tier 2
- Theory and history of conflict. By using historic examples or game theoretical analysis, we could gain a better understanding of the fundamental dynamics of conflicts, which might in turn lead to insights that are also applicable to conflicts involving AI systems.
- Moral circle expansion. Making sure that all sentient beings are afforded moral consideration is another fairly broad lever to improve the quality of the long-term future.
Past grants
- $26,000 to Rethink Priorities: We funded their surveys on descriptive population ethics because learning more about these values and attitudes may inform people’s prioritization and potential moral trades. We also made a smaller grant for a survey investigating the attitudes toward reducing wild animal suffering.
- $27,450 to Daniel Kokotajlo: Daniel will collaborate with Caspar Oesterheld and Johannes Treutlein to produce his dissertation at the intersection of AI and decision theory. His project will explore acausal trade in particular and coordination mechanisms between AI systems more generally. This work is relevant to AI safety, AI policy, and cause prioritization. This grant will buy him out of teaching duties during his PhD to allow him to focus on this work full-time.
Fund management
As we learn more, we might make changes to this initial setup.
Fund managers
We chose the fund managers based on their familiarity with the fund’s mission and prioritization, the amount of time they can dedicate to this work, and relevant research expertise. They were approved by the board of EAF.
- Lukas Gloor is responsible for prioritization at the Effective Altruism Foundation, and coordinates our research with other organizations. He conceptualized worst-case AI safety, and helped coin and establish the term s-risks. Currently, his main research focus is on better understanding how different AI alignment approaches affect worst-case outcomes.
- Brian Tomasik has written prolifically and comprehensively about ethics, animal welfare, artificial intelligence, and the long-term future from a suffering-focused perspective. His ideas have been very influential in the effective altruism movement, and he helped found the Foundational Research Institute, a project of the Effective Altruism Foundation, which he still advises. He graduated from Swarthmore College in 2009, where he studied computer science, mathematics, statistics, and economics.
- Jonas Vollmer is the Co-Executive Director of the Effective Altruism Foundation where he is responsible for setting the strategic direction, communications with the effective altruism community, and general management. He holds degrees in medicine and economics with a focus on health economics and development economics. He previously served on the boards of several charities, and is an advisor to the EA Long-term Future Fund.
Grantmaking
The current balance of the fund is $68,638 (as of November 27), and we expect to be able to allocate $400k–$1.5M during the first year. We will likely try different mechanisms for proactively enabling the kind of research we’d like to see, e.g. requests for proposals, prizes, teaching buy-outs, and scholarships.
Given the current state of academic research on s-risks, it’s impossible to find senior academic scholars who could judge the merit of a proposal based on its expected impact. However, we will consult domain experts where we think their judgment adds value to the evaluation. We also ran a hiring round for a research analyst, whom we expect to support the fund managers. They may also take on more grantmaking responsibilities over time.
Grant recipients may be charitable organizations, academic institutions, or individuals. However, we expect to often fund individual researchers and small groups as opposed to large organizations or institutes. Grants are approved by a simple majority of the fund managers. We expect grants to be made at least every six months.
We will experiment with different formats for publishing our reasoning behind individual grant decisions and evaluating past grants (e.g. trying to use predictions). This will likely depend on the number and size of grants.
When should you give to this fund?
CEA has already written up reasons for giving to funds in general. We won’t repeat them here. So when does it make sense to give to this fund in particular?
- You think long-termism, broadly construed, should guide your decisions.
- You think there is a significant chance of AI profoundly shaping the long-term future.
When does it make sense to give to this fund instead of the EA Long-term Future Fund?
- You are interested in improving the quality of the long-term future, addressing s-risks from AI in particular. This might be the result of your normative views, e.g. a strong focus on suffering, from pessimistic empirical beliefs about the long-term future, or from thinking that s-risks are currently neglected.
- You trust the judgments of the fund managers or the Effective Altruism Foundation.
How to donate to this fund
You can donate to this fund via the Effective Altruism Foundation (donors from Germany, Switzerland, the Netherlands) or the EA Funds Platform (donors from the US or the UK).
Note: Until December 29 donations to the EAF Fund can be matched 1:1 as part of a matching challenge. (For the matching challenge we’re still using the former name “REG Fund”.)
Happy to see this be set up! It makes a lot of sense to me that our community gradually gets more and more competition from different funds. Over time we'll get evidence on which one seem to perform the best.
A few questions:
1) Are you planning on coordinating information and decisions with other funds? I general my impression is that there should be similar application procedures and perhaps some information sharing. That said, I could imagine ways where coordination could come across in bad ways.
2) Where do you expect to get most of your donations from? Individual EA donors?
3) Your Tier-1 and Tier-2 questions don't seem exclusive to s-risks, they seem to me like things that would be very applicable to much of AI-risk concerns. Does this sound correct to you? Are you seeking projects that focus specifically on s-risks within these areas?
Thanks! :)
1) The Long-Term Future Fund seems most important to coordinate with. Since I'm both a fund manager at the EAF Fund and an advisor to the Long-Term Future Fund, I hope to facilitate such coordination.
2) Individual EA donors, poker pros (through our current matching challenge), and maybe other large donors.
3) Yes, that sounds correct. We're particularly excited to support researchers who work on specific s-risk-related questions within those areas, but I expect that the research we fund could also positively influence AI in other ways (e.g. much of the decision theory work might make positive-sum trade more likely and could thereby increase the chance of realizing the best possible outcomes). We might also fund established organizations like MIRI if they have room for more funding.
Just wanted to note that the use of "worst case" in the mission statement
The fund’s mission is to address worst-case risks (s-risks) from artificial intelligence.
is highly non-intuitive for people with different axiology. Quoting from the s-risk explanation
For instance, an event leading to a future containing 10^35 happy individuals and 10^25 unhappy ones, would constitute an s-risk
At least for me, this would be a pretty amazing outcome, and not something which should be prevented.
In this context
We aim to differentially support alignment approaches where the risks are lowest. Work that ensures comparatively benign outcomes in the case of failure is particularly valuable from our perspective
sounds worrisome: do I interpret it correctly that in the ethical system held by the fund human extinction is comparatively benign outcome in comparison with risks like creation of 10^25 unhappy minds even if they are offset by much larger number of happy minds?
Yeah, we're going to change the part that equates "worst case" with "s-risks". Your view is common and reflects many ethical perspectives.
We were already thinking about changing the definition of "s-risk" based on similar feedback, to make it more intuitive and cooperative in the way you describe. It probably makes more sense to have it refer to only the few % of scenarios where most of the future's expected suffering comes from (assuming s-risks are indeed heavy-tailed). These actual worst cases are what we want to focus on with the fund.
No, that's incorrect. Insofar as some fund managers hold this view personally (e.g., I do, while Jonas would agree with you that the latter outcome is vastly better), it won't affect decisions because in any case, we want to avoid doing things that are weakly positive on some plausible moral views and very negative on others. But I can see why you were concerned, and thanks for raising this issue!
I guess to me, the part of the future with 10^25 unhappy individuals sounds like an s-risk. I would imagine an s-outcome could take place in a universe that's still on the net good. Just because the universe may be on the net good though, doesn't mean we shouldn't be concerned with large s-outcomes that may happen.
Yeah. I put it the following way in another post:
So it'd be totally fine to address all sources of unnecessary suffering (and even "small" s-risks embedded in an otherwise positive future) if there are targeted ways to bring about uncontroversial improvements. :) In practice, it's sometimes hard to find interventions that are targeted enough because affecting the future is very very difficult and we only have crude levers. Having said that, I think many things that we're going to support with the fund are actually quite positive for positive-future-oriented value systems as well. So there certainly are some more targeted levers.
There are instances where it does feel justified to me to also move some probability mass away from s-risks towards extinction (or paperclip scenarios), but that should be reserved either for uncontroversially terrible futures, or for those futures where most of the disvalue for downside-focused value systems comes from. I doubt that this includes futures where 10^10x more people are happy than unhappy.
And of course positive-future-oriented EAs face analogous tradeoffs of cooperation with other value systems.
We at CLR are now using a different definition of s-risks.
New definition:
S-risks are risks of events that bring about suffering in cosmically significant amounts. By “significant”, we mean significant relative to expected future suffering.
Note that it may turn out that the amount of suffering that we can influence is dwarfed by suffering that we can’t influence. By “expectation of suffering in the future” we mean “expectation of action-relevant suffering in the future”.
I'm wondering a bit about this definition. One interpretation of it is that you're saying something like this:
"The expected future suffering is X. The risk that event E occurs is an S-risk if and only if E occurring raises the expected future suffering significantly above X."
But I think that definition doesn't work. Suppose that it is almost certain (99,9999999%) that a particular event E will occur, and that it would cause a tremendous amount of suffering. Then the expected future suffering is already very large (if I understand that concept correctly). And, because E is virtually certain to occur, it occurring will not actually bring about suffering in cosmically significant amounts relative to expected future suffering. And yet intuitively this is an S-risk, I'd say.
Another interpretation of the definition is:
"The expected future suffering is X. The risk that event E occurs is an S-risk if and only if the difference in suffering between E occurring and E not occurring is significant relative to X."
That does take care of that issue, since, by hypothesis, the difference between E occurring and E not occurring is a tremendous amount of suffering.
Alternatively, you may want to say that the risk that E occurs is an S-risk if and only if occurring brings about a significant amount of suffering relative to what we expect to occur from other causes. That may be a more intuitive way of thinking about this.
A feature of this definition is that the risk of an event E1 occurring can be S-risk even if it occurring would cause much less suffering than another event E2 would, provided that E1 is much more likely to occur than E2. But if we increase our credence that E2 will occur, then the risk of E1 occurring will cease to be an S-risk, since it no longer will cause a significant amount of suffering relative to expected future suffering.
I guess that some would find that unintuitive, and that something being an S-risk shouldn't depend on us adjusting our credences in independent events occurring in this way. But it depends a bit what perspective you have.
Great to see this set up!
Small note on the name: is not at all clear for people unfamiliar with EAF what the fund is about, and a name change for the fund would probably get you more attention. Something like "suffering prevention fund", "suffering reduction fund", "long-term suffering reduction fund", "suffering risk fund" would all be significantly clearer (even though they feel inadequate to describe the fund's goals).
Thanks for the feedback! I think part of the challenge is that the name also needs to be fairly short and easy to remember. "Long-Term Future Fund" is already a bit long and hard to remember (people often seem to get it wrong), so I'm nervous about making it even longer. We seriously considered "S-risk Fund" but ultimately decided against because it seems harder to fundraise for from people who are less familiar with advanced EA concepts (e.g., poker pros interested in improving the long-term future). Also, most people who understand the idea of s-risks will also know that EAF works on them.
I'd be curious to hear whether the above points were convincing, or whether you'd still perceive it as suboptimal.
I still perceive it as suboptimal, although I understand you don't like any of the potential names.
I think this touches on a serious worry with fundraising from people unfamiliar with EA concepts: why should they donate to your fund rather than another EA fund if they don't understand the basic goal you are aiming for with your fund?
I can imagine that people donate to the EAF fund for social reasons (e.g. you happen to be well-connected to poker players) more than intellectual reasons (i.e. funders donate because they prioritize s-risk reduction). If that is the case, I'd find it problematic that the fund is not clearly named: it makes it less likely that people donate to particular funds for the right reasons.
Of course, this is part of a larger coordination problem in which all kinds of non-intellectual reasons are driving donation decisions. I am not sure what the ideal solution is*, but I wanted to flag this issue.
*Perhaps it should be a best practice for EA fundraisers to always recommend funders to all go through a (to be created) donation decision tool that takes them through some of the relevant questions. That tool would be a bit like the flowchart from the Global Priorities Project, but more user-friendly.
We always point out that the fund is focused on reducing suffering in the long-term future.
Also, why should they donate to that other fund instead? E.g., the Long-Term Future Fund is also importantly motivated by "astronomical waste" type considerations which those donors don't understand either, and might not agree with.
Yeah. This will always be the case with many donors, regardless of which fund they donate to.
I wouldn't call it a coordination problem in the game-theoretic sense, and I think in many cases this actually isn't even a problem: I think it's important that donors aren't deceived into supporting something that they wouldn't want to support; but in the many cases where donors don't have informed opinions (e.g., on population ethics), it's fine if you fill in the details for them with a plausible view held by a significant part of the community.
I think we'd be open to doing something like this.
Yes, I'm not saying you're misleading your donors, nor that they are less informed than donors of other funds. Just that there are many reasons people are donating to a particular fund, and I think properly naming a fund is a step in the right direction.
I see it as coordination between different fund managers, where each wants to maximize the amount of funds for their own fund. As such, there are some incentives to not maximally inform one's donors if other funding possibilities, or the best arguments against donating to the fund they are fundraising for.
I'm not saying that this type of selfish behavior is very present in the EA community - I've heard that it is quite the opposite. But I do think that the situation is not yet optimal: current allocation of resources is not based largely on careful weighing the relevant evidence and arguments. I also think we can move closer to this optimal allocation.
Anyway, I didn't mean to make this into a large debate :) I'm glad the fund exists and I'd be happy if the name changes to something more distinguishable!
Thanks! I see the point better now. While I don't fully agree with everything, I think it could make sense to rename the fund if/once we have a good idea.