Since 2017, EA Funds has been providing grants across four distinct cause areas. While there are payout reports available, there is a lack of reports detailing the outcomes of these grants, so I looked out of curiosity into the Grants Database to review some of the proposals that received funding and evaluate their outcomes.
Some of the findings were quite unexpected, mainly for the Long-Term Future Fund and the EA Infrastructure Fund.
The case involving a $100,000 grant for a video game
In July 2022, EA approved a $100,000 grant to Lone Pine Games, LLC, for developing and marketing a video game designed to explain the Stop Button Problem to the public and STEM professionals.
Outcomes from looking into Lone Pine Games, LLC:
- After almost two years, there are no online mentions of such a game being developed by this company, except for the note on the EA Funds page.
- Lone Pine Games released only one game available on multiple gaming platforms back in 2020 called NewCity, a city-building simulator, similar to SimCity from the 1990s.
- After a few updates, the development of NewCity was abandoned last year, and its source code was made public on GitHub.
Video trailer for the game NewCity, which was highly likely the primary track record in the grant proposal for the development of the Stop Button Problem video game.
Despite the absence of any concrete results from this grant, let’s entertain the idea that the Stop Button Problem game was produced and that it was decent. Would such a game "positively influence the long-term trajectory of civilization," as described by the Long-Term Future Fund? For context, Rob Miles's videos (1) and (2) from 2017 on the Stop Button Problem already provided clear explanations for the general public.
It seems insane to even compare, but was this expenditure of $100,000 really justified when these funds could have been used to save 20–30 children's lives or provide cataract surgery to around 4000 people? With a grant like this one, Steven Pinker's remarks on longtermism funding seem increasingly accurate. Training guide dogs for the visually impaired, often highlighted in EA discussions, stands out as a highly effective donation strategy by contrast.
I selected this example because it involves a company rather than individual recipients. However, I found numerous other cases, many even worse, primarily involving digital creators with barely any content produced during their funding period, people needing big financial support to change careers, and independent researchers whose proposals had not, at the time of writing, resulted in any published papers. As Yann LeCun recently said, “If you do research and don't publish, it's not science.” These grants are not just a few thousand dollars distributed here and there; in many instances, they amount to tens or even hundreds of thousands of dollars.
While many people, myself included, strongly support funding people's passion projects, because some of these grants really look that way, this approach seems more akin to Patreon or Kickstarter and not to effective giving. I believe many donors, who think they are contributing effectively, may not be fully aware of how their money is being utilized. Donors contribute to these funds expecting rigorous analysis comparable to GiveWell's standards, even for more speculative areas that rely on hypotheticals, hoping their money is not wasted, so they entrust that responsibility to EA fund managers, whom they assume make better and more informed decisions with their contributions.
With seven full years of funding on record, I believe a thorough evaluation of previous grants is needed. Even if the grants were provided with no strings attached, it is important to assess, from a broad perspective, whether they achieved their intended objectives.
I was the primary grant evaluator for this project. I think this kind of stuff is of course extremely high variance to fund, and I probably wouldn't make the grant today (both based on the absence of high-quality outputs, and based on changes in the funding landscape).
Note that this grant was made at the very peak of the period of very abundant (partially FTX-driven) EA funding where finding good funding opportunities was extremely hard.
I think video games are a pretty promising medium to explain a bunch of safety ideas in. I agree the creator doesn't seem to have done very much with the grant (though to be clear, they might still publish something), which makes the grant bad in-retrospect.
In my experience this is the default outcome of founding almost any early-stage entrepreneurial project like this, which does sure make this area hard to think about. But like, yeah, most software startups and VC investments don't really produce anything. $100k is not a lot for a competent software engineers and if you paid market salary that would pay for ~2-5 months of engineering time. Wasting that much time is a common thing to happen in entrepreneurial efforts (and I've wasted many engineering years within Lightcone Infrastructure on efforts that didn't really pay off in any meaningful way, though I think my successes paid for those losses with lots of slack to spare).
I also want to provide a clear response to this quote:
Please please please, for god's sake, do not expect any kind of rigorous analysis comparable to GiveWell's for anyone trying to direct funding towards interventions that help the long term future. Such analysis does not exist, and if someone tries to sell you on them, they are lying to you. Understanding the impact of global health interventions is much much easier, has a much longer history, and much more established methodology, than anyone working on long term future stuff. If you desire a level of legibility, reproducibility and rigor comparable to GiveWell analyses, I recommend you stick to global health charities (though I wouldn't recommend it, since I think your expected impact would likely be multiple orders of magnitude smaller). Do not give to the LTFF.
(I also work at the Long-Term Future Fund)
Yeah this is the obvious dynamic going on here, thanks for pointing it out.
I'm skeptical. My current opinion of "edutainment for GCR stuff" is the same as my opinion for edutainment more broadly: "Sounds like a good idea at first glance, basically no wins. Probably doomed" I'd be curious to see your arguments here, or case studies.
It's very salient to me that the very successful paperclips game managed to hit many hard parts of doing a game on AI Safety:
But despite the fairly impressive output, the game, AFAIK, has ~zero traceable long-term impact. I'm not aware of anybody who was convinced to work on technical AIS as a result of the game, or people who said that their world-models for AI risk were improved, or significant pieces of communication that built on the game, or tangible advocacy or policy wins.
This somewhat updates me against the genre overall. Since the paperclips game was quite successful on standard metrics (and was designed by a professor of video game design), I think most would-be grantees should be expected to develop worse games (or leave them incomplete), so even less likely to have longer-term impact.
Oh, I actually know of multiple people who told me they found a bunch of safety ideas because of the universal paperclips game. My guess is that it would have very likely been worth $100k+ by my lights. Of course this kind of thing would require proper surveying to identify, but my guess is if you included a question for it, you would have it show up for at least 1-2 people in the Open Phil survey, though I am definitely not confident.
While I'm also sceptical of this type of grant, I think this sort of comment is fundamentally misunderstanding marketing, which is what it sounds like this game essentially was. I'd be hard pressed to name anyone who made a decision based on a single advert, yet thousands of companies pay vast sums of money to produce them.
When your reach is high enough (and 450 unique visitors in 11 days is a very large reach by comparison to, say, a 2-year old intro video by Robert Miles which has 150k total views to date), even an imperceptibly small nudge can have a huge effect in expectation.
The comparison to Robert Miles is pretty apt imo, because I'm indeed aware of people who trace their decisions to work on AI safety to Rob Miles' videos.
I played the paperclips game 6-12 months before reading Superintelligence (which is what convinced me to prioritize AI x-risk), and I think the game made these ideas easier for me to understand and internalize.
Are you sure there are basically no wins? Kaj Sotala has an interesting anecdote about the game DragonBox in this blog post. Apparently it's a super fun puzzle game that incidentally teaches kids basic algebra.
When I was a kid, I played some of edugames of the form "pilot a submarine, dodge enemies, occasionally a submarine-themed math problem pops up". I'm not excited about that sort of game. I'm more excited about what I'd call a "stealth edugame" -- a game that would sell just fine as an ordinary game, but teaches you useful knowledge that happens to be embedded in the game mechanics. Consider the game Railroad Tycoon 2. It's not marketed as an edugame, and it's a lot of fun, but as you play you'll naturally pick up some finance concepts like: debt and equity financing, interest rates, the business cycle, profit and loss, dividends, buying stock on margin, short selling, M&A, bankruptcy, liquidation, etc. You'll get an intuitive idea of what supply and demand are, how to optimize your operations for profitability, and how to prioritize investments based on their net present value.
Another example along the same lines -- not primarily edutainment, but apparently law professors play clips of that movie in their classes because it is so accurate.
Nope, not sure at all. Just vague impression.
@Kaj_Sotala wrote that post 11 years ago, titled "Why I’m considering a career in educational games." I'd be interested to see if he still stands by it and/or have more convincing arguments by now.
I think that some of the bits in that essay were too strong, in particular this line
was probably wrong, for reasons Andy Matuschak outlines:
On the other hand, in principle it still seems to me like you should be able to make games that significantly improve on current education. Even if an edugame wasn't as fun as a pure entertainment game, it could still be more fun than school. And people still watch documentaries because they value learning, even though documentaries can't compete with most movies and TV shows on pure entertainment value.
But then again, for some reason DragonBox seems to have been an exception rather than the rule. Even the company that made it mostly just made games for teaching simpler concepts to younger kids afterward, rather than moving on to teaching more complicated concepts. The fact that I haven't really heard of even reasonably-decent edugames coming out in the 11 years since that post seems like strong empirical evidence against its thesis, though I don't really understand the reason for that.
What were the conditions of the grant? What follow-up was there after the grant was made? Was there a staged payment schedule based on intermediate outputs? If this grant went to a for-profit and no output was produced, can the money be clawed back?
I don't have the specific grant agreement in front of me and feel somewhat uncomfortable disclosing more information about this application before running the request by the grantees. I'm happy to share the following thoughts, which I believe address most of your questions but I'm sorry if you are mostly interested in this specific case as opposed to the more general situation.
For all grants, we have grantees sign a grant agreement that outlines the purpose and use of the grant, record-keeping requirements, monitoring, prohibited use, situations where we are entitled to recover the grant, limitation of liability etc.
Grantees submit progress reports every six months; these are useful inputs in evaluating future grants to the grant recipient. We are figuring out how to provide more accountability to speculative projects, but we will likely (for some projects) have more intensive check-ins. We've been experimenting with hosting office hours for current grantees, and some have used it to update us on their progress and get feedback (though the calls are for whatever use the grantee would find most helpful).
There was no staged payment schedule for this grant. We have done this in the past, and it might have been good to do this in this case - but at the time the grant was made, there were substantial resources for longtermist grantmaking. I don't think that this grant would be above our current bar. If we were to make a similar grant to it again, then we'd likely try and tie payouts to concrete criteria (and I am in several discussions with applicants where we are trying to figure out what the criteria should be, but it's worth noting that it's a lot of effort - it could easily double the normal time taken to evaluate an application) and it's somewhat unclear whether it's worth the cost, but at least right now I am excited about doing more of this work.
I'm not sure about clawing back money. We often include unsatisfactory progress as a condition for clawback in grant agreements - but I'm nervous about exercising this without clear, upfront, mutually agreed definitions of satisfactory progress. I am much more comfortable exercising clawback options when grantees have used the grant "for purposes other than those for which they have been awarded". Grantees often check in with me about repurposing their grant or looking to return funding when they feel they have underperformed or would like to end the project early.
For a grant of this nature, where the grantee does not produce the work product envisioned by the grant, I'd at least want to see that the grantee had devoted an appropriate amount of time to attempting to complete the grant. Here, that might be something like 0.4-0.5 FTE-years (somewhat making up a number here). To the extent that the grantee both did not produce the output and did not spend enough time attempting to do so in light of the grant amount, then I would view that as ~ using the grant for unauthorized purposes.
That isn't the best possible system, but at least it is low cost and promotes donor confidence (insofar as it at least should ensure that the grantee made an honest effort and didn't slack off / enrich themselves).
That makes sense. I currently believe that the grantee did honour their commitment re hours spent on this project, and if I came to believe otherwise I would be much more inclined to claw back funding.
(You didn't explicitly make this claim, but I'd like to push back somewhat on people with unsuccessful longtermist projects "slacking off". In general, my impression from speaking to grantees (including those with failed projects) is that they are overworked rather than underworked relative to "normal jobs" that pay similarly or are similarly challenging/selective.)
This sounds plausible. Such evaluation involves time costs, but could yield valuable info about the reliability of the fund's grantmaking decisions, whether the "hits" sufficiently compensate for the "duds", and whether there are patterns among the duds that might helpfully inform future grantmaking.
I'd be a bit surprised if there wasn't already a process in place for retrospective analysis of this sort. Is there any public info available about if/how EA Funds do this?
I'm a bit wary of picking out weird-sounding proposals as "obviously" ex ante duds. Presumably a lot of the "digital content" grants were aimed at raising public awareness of key longtermist issues (e.g. AI safety), and it seems prima facie reasonable to think both that (i) a computer game could reach a different audience from youtube videos, and (ii) raising awareness of key longtermist issues is a helpful first step for making broader progress on them.
For people who disagree with (ii), I think a more general post critiquing the very ideas of movement-building and raising awareness as valuable strategies could be interesting (and maybe more productive than picking out particular attempts that just "sound weird" to a general audience)?
When I looked at this as part of the 2022 red teaming contest, I found that “EA Funds has received roughly $50 million in donations and has made hundreds of grants, but has never published any post-grant assessments.” I’m almost positive there haven’t been any retrospective analyses of EA Funds grants since then.
This problem isn’t unique to EA Funds. I also found that EA Grants and the Community Building Grants program both lacked any kind of public post grant assessment.
The unfortunate result of this situation is that while lots of time and money have been invested in various grantmaking programs, we don’t really know much about what types of grantmaking are most effective (e.g. granting to individuals vs. established organizations). It’s true that post-grant assessment is costly to conduct, but it’s disappointing that we haven’t made this investment which could significantly improve the efficacy of future grantmaking.
There was an LTFF evaluation a few years ago.
I wonder if you could make post-grant assessment really cheap by automatically emailing grantees some sort of Google Form. It could show them what they wrote on their grant application and ask them how well they achieved their stated objectives, plus various other questions. You could have a human randomly audit the responses to incentivize honesty.
Wow, I didn't realize that evaluation existed! Thanks for sharing! (Though given that this evaluation only covers ~2 dozen grants for one fund, I think my overall assessment that there's little in the way of post-grant evaluation still holds).
Self-assessment via a simple google form is an interesting idea. My initial reaction is that it would be hard to structure the incentives well enough for me to trust self-assessments. But it still could be better than nothing. I'd be more excited about third party evaluations (like the one you shared) even if they were extremely cursory (e.g. flagging which ones have evidence that the project was even executed vs. those that don't) and selective (e.g. ignoring small grants to save time/effort).
Yeah, to be clear, I am also quite sad about this. If I had more time next to my other responsibilities, I think doing better public retrospectives on grants the LTFF made would be one of my top things to do.
"Id be curious to see more analysis here. If it is the case that a very large fraction of grants are useless, and very few produce huge wins, then I agree that that would definitely be concerning."
This wouldn't necessarily be concerning to me, if the wins are big enough. If you have a "hits based" approach then maybe 1 in 5 (or 1 in 10) huge wins is fine if you are getting enormous inpact from those.
I would LOVE to see a proper evaluation of 'hits based" funding from funders like OpenPhil and LTFF (I mentioned this a while back). To state the obvious a "hits based" only makes sense if you actually hit every now and then - are we hitting? I would hope also that there was a pre-labelling system of which grants were "hits based" so there wasn't ex-ante cherry picking on evaluation either biasing towards success or failure.
One possibility would be for these orgs to pay an external evaluator to look at these, to reduce bias. Above someone mentioned 3-8% of org time could be spent on evaluations - how about something like 2% of the money. For LTFF Using 2% of the grant funds to fund an external evaluation of grant success at a million a year budget would be $60,000 to assess around 3 years of grants - I'm sure a very competent person could do a pretty good review in 4-6 months for that money.
Somewhat unrelated, but since people are discussing whether this example is cherry-picked vs. reflective of a systemic problem with infrastructure-related grants, I'm curious about the outcome of another, much larger grant:
Has there been any word on what happened to the Harvard Square EA coworking space that OP committed $8.9 million to and that was projected to open in the first half of 2023?
I appreciate the investigation, but have mixed feelings about these points.
> Would such a game "positively influence the long-term trajectory of civilization," as described by the Long-Term Future Fund? For context, Rob Miles's videos (1) and (2) from 2017 on the Stop Button Problem already provided clear explanations for the general public.
It sounds like you're arguing that no other explanations are useful, because Rob Miles had a few videos in 2017 on the issue? As much as I'd like it to be the case that something just has to be explained in one way to one group at one time, and everyone else will figure that out, generally disseminating even simple ideas is a lot of work. The basics of EA are very simple, but we still need to repeat them in many ways to many groups to make an even limited impact.
> It seems insane to even compare, but was this expenditure of $100,000 really justified when these funds could have been used to save 20–30 children's lives or provide cataract surgery to around 4000 people?
These are totally different modes of impact. I assume you could make this argument for any speculative work. There are many billions of dollars spent each year on research and projects that end up as failures.
I'm scared of this argument because it's easy to use it to attack any speculative work. "A Givewell analyst spent 3 months investigating economic policies in India and that didn't work out? They could have saved 5 lives!"
I also want to flag that $100k sounds like a lot to some individuals, but in practice, often buys frustratingly little when spent on western professionals. One good software developer can easily cost $200k-$300k per year, all things included, if employed.
> With seven full years of funding on record, I believe a thorough evaluation of previous grants is needed.
I also like grant evaluation, but I would flag that it's expensive, and often, funders don't seem very interested in spending much money on it. One question is how much of the total LTFF budget should go to grant evaluation. I'd expect probably 2-8% is reasonable, but funders might not love this.
> However, I found numerous other cases, many even worse, primarily involving digital creators with barely any content produced during their funding period, people needing big financial support to change careers, and independent researchers whose proposals had not, at the time of writing, resulted in any published papers.
I'd be curious to see more analysis here. If it is the case that a very large fraction of grants are useless, and very few produce huge wins, then I agree that that would definitely be concerning.
I would flag that I think many of us are fairly frustrated by opportunities in longtermism. There aren't many very clear wins as we'd like, so a lot of the funding is highly speculative right now.
--
Lastly, I'd of course grant that this project looks very underwhelming now. I have no idea what the story was behind funding it - I assume that there was some surprising evidence of promise, but it didn't work out for reasons. I'm assuming that the team spent some a few months on it, but it didn't seem promising enough to continue. This situation is common in these sorts of projects, which can be very hit-or-miss.
Lastly, kudos for focusing on an organization instead of an individual - this seems like a safe choice to me.
I worry the most about this:
You and I understand the current SotA for longtermist opportunities. The best a visitor to the EA Funds page gets is:
(In low-contrast text, no less—the rest is presented as being equivalent to the other funds). I don’t have evidence for this claim, but I’m concerned that longtermist funds are drawing false equivalences; that most funders would assume their risk profiles are merely 1-10x worse when they may be orders of magnitude worse than that.
But, on the other hand, I don’t know how bad the problem is. It feels subjectively easy to cherry-pick joke projects, but as you note, these are change on the huge amounts of money these funds have to give out. I don’t know if these projects make up the bulk of those getting this funding.
Hi Ozzie, I typically find the quality of your contributions to the EA Forum to be excellent. Relative to my high expectations, I was disappointed by this comment.
This struck me as strawmanning.
I'm more sympathetic to this, but I still didn't find your comment to be helpful. Maybe others read the original post differently than I did, but I read the OP is simply expressing the concept "funds have an opportunity cost" (arguably in unnecessarily hyperbolic terms). This meant that your comment wasn't a helpful update for me.
On the other hand, I appreciated this comment, which I thought to be valuable:
Thanks for the comment Sanjay!
I think your points are quite fair.
1. I agree my sentence "It sounds like you're arguing that no other explanations are useful, because Rob Miles had a few videos in 2017 on the issue?" was quite overstated. I apologize for that.
That said, my guess is that I'm really not sure if presence of the Rob Miles videos did decrease the value of future work much. Maybe by something like 20%? I could also see situations where the response was positive, revealing that more work here would be more valuable, not less.
All that said, my guess is that this point isn't particularly relevant, outside of what it shows of our arguing preferences and viewpoints. I think the original post would have a similar effect without it.
That's relevant to know, thanks! This wasn't my takeaway when reading it (I tend to assume that it's clear that funds have opportunity costs, so focused more on the rest of the point), but I could have been wrong.
In particular, I'd like to see analysis of a fair[1] sample.
I don't think we would necessarily need to see a "very large fraction" be "useless" for us to have some serious concerns here. I take Nicolae to raise two discrete concerns about the video-game grant: that it resulted in no deliverable product at all, and that it wouldn't have been a good use of funds even if it had. I think the quoted analysis addresses the second concern better than the first.
If there are "numerous other cases, many even worse, . . . involving digital creators with barely any content produced during their funding period," then that points to a potential vetting problem. I can better see the hits-based philanthropy argument for career change, or for research that ultimately didn't produce any output,[2] but producing ~no digital output that the grantee was paid to create should be a rare occurrence. It's hard to predict whether any digital content will go viral / have impact, but the content coming into existence at all shouldn't be a big roll of the dice.
I used "fair" rather than "random" to remain agnostic on weighting by grant size, etc. The idea is representative and not cherry-picked (in either direction).
These are other two grant types in Nicolae's sentence that I partially quoted in my sentence before this one.
That true statement seemingly misses the forest for the trees, because money going further overseas is an effective altruism tenet:
https://www.givewell.org/giving101/Your-dollar-goes-further-overseas
The experience of software outsourcing is that replacing expensive western software devs with cheaper foreign devs is often much more expensive than people expect. You can make a decent business from doing so, but it's no free lunch (unlike for GiveDirectly, where $->utils is straightforwardly better in the third world) and I wouldn't fault a startup for exclusively hiring expensive American devs.
+1 to this.
All the best tech companies have strong incentives to try to save money, but most end up still spending heavily in the US still.
Add to that the fact that EAs who apply are heavily selected from western countries.
All that said, I do support trying to outsource some of software and other things. I've had some success outsourcing technical and operations work, and will continue to try to do so in the future. I think different organizations have different advantages here, depending on their unique circumstances. (If your organization needs to be around other EAs, it might need to be based in the Bay / DC / London. If the management is already in a cheap place and prefers remote work, it's easier to be more remote-friendly.)
@Larks @Ozzie Gooen @huw worked a decade in tech, and tradeoffs justifiably prevent outsourcing everything. The truism that frustratingly little commonly gets delivered for $100k felt like the original comment simply reiterating realities of the complaint. Questioning rather than defending status quo spending is still an effective altruism tenet. To clarify, I'd rather not fund anyone anywhere working on unpublished AI video games
Equally, the best talent from non-Western countries usually migrates to Western countries where wages are orders of magnitude higher. So this ends up being self-reinforcing.
I agree with the general desire for retrospective analysis, and don't have any particular defense of this grant (save that it seems a bit cherry-picked). But I think this is mistaken:
Giving money to malaria would have been out of mandate for the LTFF. Their website is clear that if you want to fund global health and development you should give elsewhere. Whether or not this particular grant was justified depends on the comparison with other in-mandate grants and potential grants. The only justification LTFF requires to not fund cataract surgery is that this would be in violation of their commitments to donors (unless for some reason they thought cataract surgery was the best way to improve the long term future).
I think this is a good point when considering the decisions of LTFF.
However, you can make a similar question about funders who decide whether some money goes to LTFF or to some global health charity. And such funders exist (e.g. OpenPhil).
An these funders have to make the uncomfortable decision whether to save children's lives or indirectly support some research, video games, and other things via LTFF.
Comment reposted elsewhere
I think you are replying to the wrong comment here.
sorry you're right!
I think it's important that the author had this expectation. Many people initially got excited about EA because of the careful, thoughtful analysis of GiveWell. Those who are not deep in the community might reasonably see the branding "EA Funds" and have exactly the expectations set out in this quote.
I think it's very plausible that EA Funds, or LTFF specifically, should rebrand to remove "EA" from the name. I think it'd be a bit of a loss because I view us as trying to do something fully central to what I believe to be the core of EA: trying to make the best decisions we can given the limited resources we have. But communication of what "EA" means to different people have been at best mixed, and it's understandable if other people take a different position (eg if they believe that EA is about making high-quality decisions about altruistic activities with uniformly high rigor and transparency).
And this isn't really a question with a truth of the matter. Words are made by men, etc.
So plausibly we should move away from that brand, for this and several other reasons.
IMO if EA funds isn't representative of EA, I'm not sure what is. I think the different funds do a good job of accurately representing the broad diversity of viewpoints and approaches within the community, and I would personally be very sad if EA funds dropped the EA branding.
Thanks. I appreciate your kind words.
I think there's a consistent view where EA is about doing careful, thoughtful, analysis with uniformly and transparently high rigor, to communicate that analyses transparently and legibly, and to (almost) always make decisions entirely according to such analyses as well as strong empirical evidence. Under that view GiveWell, and for that matter, JPAL, is much more representative of what EA ought to be about, than what at least LTFF tries to do in practice.
I don't know how popular the view I described above is. But I definitely have sympathy towards it.
Right now, we already do quite a few things to manage expectations and make the speculative nature of our grants as upfront as possible. Do you have suggestions for how we can improve on that front?
tl;dr:
1. I think the level of rigorous analysis for LTFF grants is not comparable to GiveWell's standards. I'm sorry if I ever gave that impression, and am happy to correct that impression wherever I can.
2. The average LTFF grant size is around $40,000, while the average GiveWell grant is over $5 million, indicating a substantial difference in effort put into each grant.
3. Reasoning about existential risks and the long-term future is very difficult due to a lack of RCTs, sign confusions, and the rapidly changing landscape.
4. LTFF primarily aims to provide seed funding for potentially high-impact, long-tail projects, particularly in AI safety, with the hope that larger funders will support the projects if and when they are ready to scale.
5. For those interested in funding more (relatively) rigorous projects in the longtermist or global catastrophic risk space, you may wish to directly support established organizations like the Nuclear Threat Initiative or Johns Hopkins Center for Health Security. But please be aware that they're still much much more speculative than Givewell's recommendations.
____
Longer comment:
I work for, and make grants for, the Long-Term Future Fund. I was a fund manager at the time this grant was made, but I was not a primary investigator on this grant, believe I did not vote on it.
Thank you for the post!
I think Caleb already and Ozzie both made some points I wanted to make. So just wanted to give some context on a few things that are interesting to me.
I'm sorry if we gave the impression that we arrived at our grants with the level of rigorous analysis comparable to GiveWell's standards. I think this is false, and I'm happy to dispel any impressions that people have of this.
From the outside view, my impression is that the amount of work (and money) that's put into each grant at the Long-Term Future Fund is much lower than the amount of work (and money) that's put into each GiveWell's charity. For context, our median grant is about $33,000 and our average grant is about $40,000[1]. In comparison, if I'm reading this airtable correctly, the average GiveWell grant/recommendation is for over 5 million.
This means that there is over 100x difference between the size of the average GiveWell grant and the size of the average LTFF grant. I'm not sure how much difference in effort the difference in dollar amount translates to, but if anything I would guess that the difference in effort is noticeably higher, not lower, than 100x.
So unless you think we're over 100x more efficient than GiveWell (we're not), you should not think of our analysis as similarly rigorous to GiveWell's, just from an outside view look at the data.
From an inside view, I think it's very difficult to reason correctly about existential risks or the long-term future. Doing this type of reasoning is extremely important, but also very tricky. There is a profound lack of RCTs, sign confusions are abundant, and the space is moving very quickly where safeguards are very much not keeping up. So I think it's not possible to be as rigorous as GiveWell, even if we wanted to be.
Which brings me to my next point: We also mostly don't view ourselves as "trying to be as rigorous as GiveWell, but worse, and for longtermism." Instead we view our jobs primarily as making grants that are more like seed funding for long-tail, potentially highly impactful projects, particularly in AI safety. The implicit theory of change here is that other larger funders (Open Phil, other philanthropic foundations, corporate labs, maybe governments one day) can pick up the work if and when the projects make sense to scale.
If you're very interested in funding (relatively) rigorous projects in the longtermist or GCR space, a better option than LTFF might be to directly fund larger organizations with a more established track record, like the Nuclear Threat Initiative or Johns Hopkins Center for Health Security. To a lesser extent, places that are significant but have a shorter track record like SecureBio and Center for AI Safety.
Numbers pulled from memory. Exact numbers depend on how you count but I'd be surprised if it's hugely different. See eg this payout report.
I think this is a reasonable take in its own right, but it sits uncomfortably with Caleb Parikh's statement in a critical response to the Nonlinear Fund that 'I think the current funders are able to fund things down to the point where a good amount of things being passed on are net negative by their lights or have pretty low upside.'
I personally am not excited about making these kinds of grants and think there are now much more cost-effective opportunities within AI safety (in part because of progress reports on these kinds of “speculative advocacy” grants rarely panning out though we haven’t made many of them). I’ll nudge the primary investigator to see if they want to explain their reasoning for this grant here (if it’s not already in a payout report).
I agree that we should have more public retrospective evaluation of our grants. We commissioned some work in this space but were unhappy with the results and I am trying to figure out better solutions, but I suspect that it should be a priority for the next half of the year. I don’t expect that grants if this nature to feature prominently in that work (as we do relatively little advocacy, and the grants we have made that are notable are, imo very unfortunately, private - but by dollars amount it’s very low).
I think the broad areas for the ltff which would be valuable to retrospectively evaluate are:
It's also worth noting that this grant was made in a time when AIS/EA had a lot more philanthropic capital so the funding bar was much lower (and in my opinion there were way fewer promising projects to fund). Maybe we should indicate that in the public grants database?
I think that this paragraph seems more important than the individual case highlighted in this article:
If you made a record of these grants would you be interested in sharing them with me? I’d like to check them against our internal progress reports, I think if there’s a large mismatch in opinion on the success of grants we should potentially put more effort into communicating why we think the grants are valuable.
I'm curious how we can improve the messaging around the speculative nature of Long-Term Future Fund grants. Here's what we have on the top of the front page of LTFF:
And here's what you see if you click through to the "learn more" link:
I also try my best to communicate which things we fund, and don't/didn't fund. I also think I've been consistently candid on this issue in my private communications with donors over the last 9 months or so. So I hoped it would be clear from our public and private communication.
But hope is not a strategy. This is not the first time that people have complained about LTFF's risk profile. Unfortunately, I would be surprised if it's the last. Nonetheless, I'd be keen to know how we(I) can improve our communications here, to make this point as unambiguous as possible and reduce future confusions or missed expectations.
(edited for lucidity)
Maybe you could rename the LTFF as the Speculative Long Term Future Fund (SLTFF) or the Moonshot Fund. That is, make it clear that the LTFF is EA Funds' "Moonshot-focused" grantmaking arm. Here's a draft writeup you can use to replace the LTFF description:
I'd be interested to see explanations from the disagree-voters (even short ones would be useful). Was it the proposed renaming? The description draft? Something else?
I also don’t know how much optics should factor into grantmaking decisions. One of the lessons that I hoped the movement learned from Wytham Abbey is to broaden our EV calculations for capacity-building projects to factor in potential criticism and backlash which would turn off new recruits (and thus have the net effect of diminishing capacity).
This project, frankly, reads like a joke and would surely be off-putting for some new EAs to learn about. It and similar projects have definitely made me second-guess my identification with this movement. That feels like a relevant factor in any CEA.
At least my grantmaking on the LTFF does not take into account such considerations naively in the way you suggest here (and is very unlikely to do so in the future).
"I was a fan of Effective Altruism (almost taught a course on it at Harvard) together w other rational efforts (evidence-based medicine, data-driven policing, randomista econ). But it became cultish. Happy to donate to save the most lives in Africa, but not to pay techies to fret about AI turning us into paperclips. Still support the idea; hope they extricate themselves from this rut." - Steven Pinker
I think the pile-on of post-hoc rationalizations trying to defend or excuse this grant is evidence of the rot in EA in captured in Steven Pinker's comment. People are earnestly defending the idea that $100k on a bay area software salary for a speculative video game is worthy of the EA label. Can we at least all agree that this money would have been better spent by GiveWell?
Why is it so hard to say that the grant was a mistake not only in hindsight, but at the time it was made?
In my comment, I wrote:
This seems like the opposite of a "post-hoc rationalization"? I'm drawing on general principles that I apply similarly to any like case. I just think it's very hard to assess which speculative longtermist efforts are genuinely good bets or not, and even silly-sounding ones like a computer game could, given the stakes, be better in expectation than a more-certain but vastly lower-stakes win like those found in Global Health & Development. It really depends upon how promising an avenue it seemed for raising awareness of AI risk.
If you have a substantive argument against the principles I'm relying on, I'm all ears! But just calling them "rot" isn't particularly convincing. (It just makes me think that you don't understand where I, and others who think similarly, are coming from.)
The post-hoc rationalization is referring to the "Note that this grant was made at the very peak of the period of very abundant (partially FTX-driven) EA funding where finding good funding opportunities was extremely hard."
If it wasn't a good opportunity, why was it funded?
Why does "infrastructure" and longtermist funding rely so heavily on pascal-mugging with evidence-free hypotheticals?
I can easily craft a hypothetical in the other direction on the video game. Perhaps funding such a game reinforces the impression that EA is a self-serving cult (as Steven Pinker does), causing more people to distance themselves from any longtermist ideas. It certainly has done so with me. Wasn't accounting for negative impacts the point of your recent post on the messiness of bring hypotheticals into the real world?
To answer your second question: I think it's in the nature of seeking "systemic change" that it depends upon speculative judgment-calls, rather than the sort of robust evidence one gets for global health interventions.
I don't think that "crafting a hypothetical" is enough. You need to exercise good judgment to put longtermism into practice. (This is a point I've previously made in response to Eric Schwitzgebel too.) Is any given attempt at longtermist outreach more likely to sway (enough) people positively or negatively? That's presumably what the grantmakers have to try to assess, on case-by-case basis. It's not like there's an algorithm they can use to determine the answer.
Insofar as you're assuming that nothing could possibly be worth doing unless supported by the robust evidence base of global health interventions, I think you're making precisely the mistake that the "systemic change" critics (mistakenly) accuse EA of.
That doesn't sound like post-hoc rationalization to me. They're just providing info on how the funding bar has shifted. A mediocre opportunity could be worth funding when the bar is low (as long as the risks were also low).
I do think there are things worth funding for which evidence doesn't exist. The initial RNA vaccine research relied on good judgement around a hypothetical, and had a hard time getting funding for lack of evidence. It ended up being critical to saving millions of lives.
I think there are more ways some sort of evidence can be included in grant making. But the core of the criticism is about judgement, and I think a $100k grant for 6 months of video game developers time, or $50k grants to university student group organizers represent poor judgement (EAIF and LTFF grants). These grants have caused reputational harm to the movement, and that should have been easy to foresee. What has been the hit to fundraising for EA global health and animal welfare causes from the fallout from bad longtermism bets (FTX/SBF included)?
On the rationalization. Perhaps it isn't a post-hoc rationalization, more of an excuse. It is saying "the funding bar was low, but we still think the expected value of the video game is more important than 25 lives". That's pretty crass. And probably worse than just the $100k counterfactual because of reputational spillover to other causes.
I mean, there are pretty good theoretical reasons for thinking that anything that's genuinely positive for longtermism has higher EV than anything that isn't? Not really sure what's gained by calling the view "crass". (The wording may be, but you came up with the wording yourself!)
It sounds like you're just opposed to strong longtermism. Which is fine, many people are. But then it's weird to ask questions like, "Can't we all agree that GiveWell is better than very speculative longtermist stuff?" Like, no, obviously strong longtermists are not going to agree with that! Read the paper if you really don't understand why.
I really don't think it's fair to conflate speculative-but-inherently-innocent "bets" of this sort with SBF's fraud. The latter sort of norm-breaking is positively threatening to others - an outright moral violation, as commonly understood. But the "reputational harm" of simply doing things that seem weird or insufficiently well-motivated to others seems very different to me, and probably not worth going to extremes to avoid (or else you can't do anything that doesn't sufficiently appeal to normies).
Perhaps another way to put it is that even longtermists have obvious reasons to oppose SBF's fraud (my post that you linked to suggested that it was negative-EV for longtermist goals). But I think strong longtermists should generally feel perfectly comfortable defending speculative grants that are positive-EV and the only "risk" is that others don't judge them so positively. People are allowed to make different judgments (as long as they don't harm anyone). Let a thousand flowers bloom, and all that.
Insofar as your real message is, "Stop doing stuff that looks weird, even if it is perfectly defensible by longtermist lights, simply because I have neartermist values and disagree with it," then that just doesn't actually seem like a reasonable ask?
I think that longtermism relies on more popular, evidenced-based causes like global health and animal welfare to do its reputational laundering through the EA label. I don't see any benefit to global health and animal welfare causes from longtermism. And for that reason I think it would be better for the movement to split into "effective altruism" and "speculative altruism" so the more robust global health and animal welfare causes areas don't have to suffer the reputational risk and criticism that is almost entirely directed at the longtermism wing.
Given the movement is essentially driven by Open Philanthropy, and they aren't going to split, I don't see such a large movement split happening. So I may be inclined towards some version of, as you say, "Stop doing stuff that looks weird, even if it is perfectly defensible by longtermist lights, simply because I have neartermist values and disagree with it." The longtermist stuff is maybe like 20% of funding and 80% of reputational risk, and the most important longtermist concerns can be handled without the really weird speculative stuff.
But that's irrelevant, because I think this ought to be a pretty clear case of the grant not being defensible by longtermist standards. Paying bay area software development salaries to develop a video game (why not a cheap developer literally anywhere else?) that didn't even get published is hardly defensible. I get that the whole purpose of the fund is to do "hits based giving". But it's created an environment where nothing can be a mistake, because it is expected most things would fail. And if nothing is a mistake, how can the fund learn from mistakes?
Ok, so it sounds like your comparisons with GiveWell were an irrelevant distraction, given that you understand the point of "hits based giving". Instead, your real question is: "why not [hire] a cheap developer literally anywhere else?"
I'm guessing the literal answer to that question is that no such cheaper developer applied for funding in the same round with an equivalent project. But we might expand upon your question: should a fund like LTFF, rather than just picking from among the proposals that come to them, try taking some of the ideas from those proposals and finding different (perhaps cheaper) PIs to develop them?
It's possible that a more active role in developing promising longtermist projects would be a good use of their time. But I don't find it entirely obvious the way that you seem to. A few thoughts that immediately spring to mind:
(i) My sense of that time period was that finding grantmakers was itself a major bottleneck, and given that longtermism seemed more talent-constrained than money-constrained at that time, having key people spend more time just to save some money presumably would not have seemed a wise tradeoff.
(ii) A software developer that comes to you with an idea presumably has a deeper understanding of it, and so could be expected to do a better job of it, than an external contractor to whom you have to communicate the idea. (That is, external contractors increase risk of project failure due to miscommunication or misunderstanding.)
(iii) Depending on the details, e.g. how specific the idea is, taking an idea from someone's grant proposal to a cheaper PI might constitute intellectual theft. It certainly seems uncooperative / low-integrity, and not a good practice for grant-makers who want to encourage other high-skilled people with good ideas to apply to their fund!
Presumably there's some probability X of averting doom that you would consider more important than 25 statistical lives. I'd also guess that you'd agree that this is true for some rather low-but-nonPascalian probabilities. Eg, I predict that if you thought about the problem even briefly, you'd agree the above claim is true for X=0.001%, not just say 30%.
(To be clear I'm definitely not saying that the grant's effect size is >0.001% in expectation).
So then the real disagreement is either a) What X ought to be (where I presume you have a higher number than LTFF), or b) whether the game is above X.[1]
Stated more clearly, I think your disagreement with the grant is "merely" a practical disagreement about effect sizes. Whereas your language here, if taken literally, is not actually sensitive to the effect size.
(My own guess is that the grant was not above the 2022 LTFF bar, but that's an entirely different line of reasoning). And of course implicitly I believe the 2022 LTFF bar was above the 2022 GiveWell bar by my lights.
A butterfly flaps its wings and causes a devastating hurricane to form in the tropics. Therefore, we must exterminate butterflies, because there is some small probability X that doing so will avert hurricane disaster.
But it is just as easily the case that the butterfly flaps prevent devastating hurricanes from forming. Therefore we must massively grown their population.
The point being, it can be practically impossible to understand the casual tree and get even the sign right around low probability events.
That's what I take issue with - it's not just the numbers, it's the structural uncertainty of cause and effect chains when you consider really low probability events. Expected value is a pretty bad tool for action relevant decision making when you are dealing with such numerical and structural uncertainty. It's perhaps better to pick a framework like "it's robust under multiple decision theories" or "pick something that has the least downside risk".
In our instance, two competing plausible structural theories among many are something like: "game teaches someone an AI safety concept -> makes them more knowledgeable or inspire them to take action -> they work on AI safety -> solve alignment problem -> future saved" vs. "people get interested in doing the most good -> sees community of people that claim to do that, but that they fund rich people to make video games -> causes widespread distrust of the movement -> strong social stigma developed against people that care about AI risk -> greatly narrowed range of people / worldviews because people don't want to associate -> makes it near impossible to solve alignment problem -> future destroyed"
The justifications for these grants tend to use some simple expected value calculation of a singular rosy hypothetical casual chain. The problem is it's possible to construct a hypothetical value chain to justify any sort of grant. So you have to do more than just make a rosy casual chain and multiply numbers through. I've commented before on some pretty bad ones that don't pass the laugh test among domain experts in the climate and air quality space.
The key lesson from early EA (evidenced based giving in global health) was that it is really hard to understand if the thing you are doing is having an impact, and what the valence of the impact is, for even short, measurable casual chains. EA's popular causes now (longtermism) seem to jettison that lesson, when it is even more unclear what the impact and sign is through complicated low probability casual chains.
So it's about a lot more than effect sizes.
Worth noting that even GiveWell doesn't rely on a single EV calculation either (however complex). Quoting Holden's 10 year old writeup Sequence thinking vs. cluster thinking:
To the downvoters: my understanding of negative karma is that it communicates "this comment is a negative epistemic contribution; its existence is bad for the discussion." I can't imagine that anyone of intellectual honesty seriously believes that of my comment. Please use 'disagree' votes to communicate disagreement.
[Edit to add: I don't really think people should be downvoting Matthew's comments either. It's a fine conversation to be having!]
I have empathy towards your position, but I think Pinker's quote aged very poorly in 2024, to put it mildly. My guess is it'd be obvious enough to even Pinker by 2029, but the future is hard and we shall see.
I pretty strongly disagree with the thrust of this post. As other commenters have pointed out, public outreach projects are inevitably heavy-tailed ex-ante, and if we think that public outreach projects are worth funding then most outputs will be duds. One could make the argument that public outreach projects aren't worth it in expectation, or that there's some good reason to think that it's silly to fund video games for public outreach, but generically picking on grants that look bad ex-post seems like a very bad way to evaluate funding strategies to me.
With all due respect to Yann LeCun, in my view he is as wrong here as he is dismissive about the risks from AGI.
Publishing is not an intrinsic and definitional part of science. Peer reviewed publishing definitely isn't--it has only been the default for several decades to a half century or so. It may not be the default in another half century.