Doctor from NZ, independent researcher (grand futures / macrostrategy) collaborating with FHI / Anders Sandberg. Previously: Global Health & Development research @ Rethink Priorities.
Feel free to reach out if you think there's anything I can do to help you or your work, or if you have any Qs about Rethink Priorities! If you're a medical student / junior doctor reconsidering your clinical future, or if you're quite new to EA / feel uncertain about how you fit in the EA space, have an especially low bar for reaching out.
Outside of EA, I do a bit of end of life care research and climate change advocacy, and outside of work I enjoy some casual basketball, board games and good indie films. (Very) washed up classical violinist and Oly-lifter.
All comments in personal capacity unless otherwise stated.
While I agree that both sides are valuable, I agree with the anon here - I don't think these tradeoffs are particularly relevant to a community health team investigating interpersonal harm cases with the goal of "reduc[ing] risk of harm to members of the community while being fair to people who are accused of wrongdoing".
One downside of having the bad-ness of say, sexual violence[1]be mitigated by their perceived impact,(how is the community health team actually measuring this? how good someone's forum posts are? or whether they work at an EA org? or whether they are "EA leadership"?) when considering what the appropriate action should be (if this is happening) is that it plausibly leads to different standards for bad behaviour. By the community health team's own standards, taking someone's potential impact into account as a mitigating factor seems like it could increase the risk of harm to members of the community (by not taking sufficient action with the justification of perceived impact), while being more unfair to people who are accused of wrongdoing. To be clear, I'm basing this off the forum post, not any non-public information
Additionally, a common theme about basically every sexual violence scandal that I've read about is that there were (often multiple) warnings beforehand that were not taken seriously.
If there is a major sexual violence scandal in EA in the future, it will be pretty damning if the warnings and concerns were clearly raised, but the community health team chose not to act because they decided it wasn't worth the tradeoff against the person/people's impact.
Another point is that people who are considered impactful are likely to be somewhat correlated with people who have gained respect and power in the EA space, have seniority or leadership roles etc. Given the role that abuse of power plays in sexual violence, we should be especially cautious of considerations that might indirectly favour those who have power.
More weakly, even if you hold the view that it is in fact the community health team's role to "take the talent bottleneck seriously; don’t hamper hiring / projects too much" when responding to say, a sexual violence allegation, it seems like it would be easy to overvalue the bad-ness of the immediate action against the person's impact, and undervalue the bad-ness of many more people opting to not get involved, or distance themselves from the EA movement because they perceive it to be an unsafe place for women, with unreliable ways of holding perpetrators accountable.
That being said, I think the community health team has an incredibly difficult job, and while they play an important role in mediating community norms and dynamics (and thus have corresponding amount of responsibility), it's always easier to make comments of a critical nature than to make the difficult decisions they have to make. I'm grateful they exist, and don't want my comment to come across like an attack of the community health team or its individuals!
(commenting in personal capacity etc)
If this comment is more about "how could this have been foreseen", then this comment thread may be relevant. I should note that hindsight bias means that it's much easier to look back and assess problems as obvious and predictable ex post, when powerful investment firms and individuals who also had skin in the game also missed this.
TL;DR:
1) There were entries that were relevant (this one also touches on it briefly)
2) They were specifically mentioned
3) There were comments relevant to this. (notably one of these was apparently deleted because it received a lot of downvotes when initially posted)
4) There has been at least two other posts on the forum prior to the contest that engaged with this specifically
My tentative take is that these issues were in fact identified by various members of the community, but there isn't a good way of turning identified issues into constructive actions - the status quo is we just have to trust that organisations have good systems in place for this, and that EA leaders are sufficiently careful and willing to make changes or consider them seriously, such that all the community needs to do is "raise the issue". And I think looking at the systems within the relevant EA orgs or leadership is what investigations or accountability questions going forward should focus on - all individuals are fallible, and we should be looking at how we can build systems in place such that the community doesn't have to just trust that people who have power and who are steering the EA movement will get it right, and that there are ways for the community to hold them accountable to their ideals or stated goals if it appears to, or risks not playing out in practice.
i.e. if there are good processes and systems in place and documentation of these processes and decisions, it's more acceptable (because other organisations that probably have a very good due diligence process also missed it). But if there weren't good processes, or if these decisions weren't a careful + intentional decision, then that's comparatively more concerning, especially in context of specific criticisms that have been raised,[1] or previous precedent. For example, I'd be especially curious about the events surrounding Ben Delo,[2] and processes that were implemented in response. I'd be curious about whether there are people in EA orgs involved in steering who keep track of potential risks and early warning signs to the EA movement, in the same way the EA community advocates for in the case of pandemics, AI, or even general ways of finding opportunities for impact. For example, SBF, who is listed as a EtG success story on 80k hours, has publicly stated he's willing to go 5x over the Kelly bet, and described yield farming in a way that Matt Levine interpreted as a Ponzi. Again, I'm personally less interested in the object level decision (e.g. whether or not we agree with SBF's Kelly bet comments as serious, or whether Levine's interpretation as appropriate), but more about what the process was, how this was considered at the time with the information they had etc. I'd also be curious about the documentation of any SBF related concerns that were raised by the community, if any, and how these concerns were managed and considered (as opposed to critiquing the final outcome).
Outside of due diligence and ways to facilitate whistleblowers, decision-making processes around the steering of the EA movement is crucial as well. When decisions are made by orgs that bring clear benefits to one part of the EA community while bringing clear risks that are shared across wider parts of the EA community,[3] it would probably be of value to look at how these decisions were made and what tradeoffs were considered at the time of the decision. Going forward, thinking about how to either diversify those risks, or make decision-making more inclusive of a wider range stakeholders[4], keeping in mind the best interests of the EA movement as a whole.
(this is something I'm considering working on in a personal capacity along with the OP of this post, as well as some others - details to come, but feel free to DM me if you have any thoughts on this. It appears that CEA is also already considering this)
If this comment is about "are these red-teaming contests in fact valuable for the money and time put into it, if it misses problems like this"
I think my view here (speaking only for the red-teaming contest) is that even if this specific contest was framed in a way that it missed these classes of issues, the value of the very top submissions[5] may still have made the efforts worthwhile. The potential value of a different framing was mentioned by another panelist. If it's the case that red-teaming contests are systematically missing this class of issues regardless of framing, then I agree that would be pretty useful to know, but I don't have a good sense of how we would try to investigate this.
This tweet seems to have aged particularly well. Despite supportive comments from high-profile EAs on the original forum post, the author seemed disappointed that nothing came of it in that direction. Again, without getting into the object level discussion of the claims of the original paper, it's still worth asking questions around the processes. If there was were actions planned, what did these look like? If not, was that because of a disagreement over the suggested changes, or the extent that it was an issue at all? How were these decisions made, and what was considered?
Apparently a previous EA-aligned billionaire ?donor who got rich by starting a crypto trading firm, who pleaded guilty to violating the bank secrecy act
Even before this, I had heard from a primary source in a major mainstream global health organisation that there were staff who wanted to distance themselves from EA because of misunderstandings around longtermism.
This doesn't have to be a lengthy deliberative consensus-building project, but it should at least include internal comms across different EA stakeholders to allow discussions of risks and potential mitigation strategies.
As requested, here are some submissions that I think are worth highlighting, or considered awarding but ultimately did not make the final cut. (This list is non-exhaustive, and should be taken more lightly than the Honorable mentions, because by definition these posts are less strongly endorsed by those who judged it. Also commenting in personal capacity, not on behalf of other panelists, etc):
Bad Omens in Current Community Building
I think this was a good-faith description of some potential / existing issues that are important for community builders and the EA community, written by someone who "did not become an EA" but chose to go to the effort of providing feedback with the intention of benefitting the EA community. While these problems are difficult to quantify, they seem important if true, and pretty plausible based on my personal priors/limited experience. At the very least, this starts important conversations about how to approach community building that I hope will lead to positive changes, and a community that continues to strongly value truth-seeking and epistemic humility, which is personally one of the benefits I've valued most from engaging in the EA community.
Seven Questions for Existential Risk Studies
It's possible that the length and academic tone of this piece detracts from the reach it could have, and it (perhaps aptly) leaves me with more questions than answers, but I think the questions are important to reckon with, and this piece covers a lot of (important) ground. To quote a fellow (more eloquent) panelist, whose views I endorse: "Clearly written in good faith, and consistently even-handed and fair - almost to a fault. Very good analysis of epistemic dynamics in EA." On the other hand, this is likely less useful to those who are already very familiar with the ERS space.
Most problems fall within a 100x tractability range (under certain assumptions)
I was skeptical when I read this headline, and while I'm not yet convinced that 100x tractability range should be used as a general heuristic when thinking about tractability, I certainly updated in this direction, and I think this is a valuable post that may help guide cause prioritisation efforts.
The Effective Altruism movement is not above conflicts of interest
I was unsure about including this post, but I think this post highlights an important risk of the EA community receiving a significant share of its funding from a few sources, both for internal community epistemics/culture considerations as well as for external-facing and movement-building considerations. I don't agree with all of the object-level claims, but I think these issues are important to highlight and plausibly relevant outside of the specific case of SBF / crypto. That it wasn't already on the forum (afaict) also contributed to its inclusion here.
I'll also highlight one post that was awarded a prize, but I thought was particularly valuable:
Red Teaming CEA’s Community Building Work
I think this is particularly valuable because of the unique and difficult-to-replace position that CEA holds in the EA community, and as Max acknowledges, it benefits the EA community for important public organisations to be held accountable (and to a standard that is appropriate for their role and potential influence). Thus, even if listed problems aren't all fully on the mark, or are less relevant today than when the mistakes happened, a thorough analysis of these mistakes and an attempt at providing reasonable suggestions at least provides a baseline to which CEA can be held accountable for similar future mistakes, or help with assessing trends and patterns over time. I would personally be happy to see something like this on at least a semi-regular basis (though am unsure about exactly what time-frame would be most appropriate). On the other hand, it's important to acknowledge that this analysis is possible in large part because of CEA's commitment to transparency.
Hey team - are you happy to share a bit more about who would be involved in these projects, and their track record (or Whylome's more broadly)? I only spent a minute or so on this but I can't find any information online beyond your website and these links, related to SMTM's "exposure to subclinical doses of lithium is responsible for the obesity epidemic" hypothesis (1, 2).
More info on how much money you're looking for the above projects would also be useful.
Ah my bad, I meant extreme pain above there as well, edited to clarify! I agree it's not a super important assumption for the BOTEC in the grand scheme of things though.
However, if one wants to argue that I overestimated the cost-effectiveness of SWP, one has to provide reasons for my guess overestimating the intensity of excruciating pain.
I don't actually argue for this in either of my comments.[1] I'm just saying that it sounds like if I duplicated your BOTEC, and changed this one speculative parameter to 2 OOMs lower, an observer would have no strong reason to choose one BOTEC over another just by looking at the BOTEC alone. Expressing skepticism of an unproven claim doesn't produce a symmetrical burden of proof on my end!
Mainly just from a reasoning transparency point of view I think it's worth fleshing out what these assumptions imply and what is grounding these best guesses[2] - in part because I personally want to know how much I should update based on your BOTEC, in part because knowing your reasoning might help me better argue why you might (or might not) have overestimated the intensity of excruciating pain if I knew where your ratio came from (and this is why I was checking the maths and seeing if these were correct, and asking if there's stronger evidence if so, before critiquing the 100k figure), and because I think other EAF readers, as well as broader, lower-context audience of EA bloggers would benefit from this too.
If you did that, SWP would still be 434 (= 43.4*10^3*10^3/(100*10^3)) times as cost-effective as GiveWell's top charities.
Yeah, I wasn't making any inter-charity comparisons or claiming that SWP is less cost-effective than GW top charities![3] But since you mention it, it wouldn't be surprising to me if losing 2 OOMs might make some donors favour other animal welfare charities over SWP for example - but again, the primary purpose of these comments is not to litigate which charity is the best, or whether this is better or worse than GW top charities, but mainly just to explore a bit more around what is grounding the BOTEC, so observers have a good sense on how much they should update based on how compelling they find the assumptions / reasoning etc.
I think it is also worth wondering about whether you trully believe that updated intensity. Do you think 1 day of fully healthy life plus 86.4 s (= 0.864*100*10^3/100) of scalding or severe burning events in large parts of the body, dismemberment, or extreme torture would be neutral?
Nope! I would rather give up 1 day of healthy life than 86 seconds of this description. But this varies depending on the timeframe in question.
For example, I'd probably be willing to endure 0.86 seconds of this for 14 minutes of healthy life, and I would definitely endure 0.086 seconds of this than give up 86 seconds of healthy life.
And using your assumptions (ratio of 100k), I would easily rather have 0.8 seconds of this than give up 1 day of healthy life, but if I had to endure many hours of this I could imagine my tradeoffs approaching, or even exceeding 100k.
I do want to mention that I think it's useful that someone is trying to quantify these comparisons, I'm grateful for this work, and I want to emphasise that these are about making the underlying reasoning more transparent / understanding the methodology that leads to the assumptions in the BOTEC, rather than any kind of personal criticism!
So I suppose I would be wary of saying that GiveDirectly now have 3–4x the WELLBY impact relative to Vida Plena—or even to say that GiveDirectly have any more WELLBY impact relative to Vida Plena
Ah right - yeah I'm not making either of these claims, I'm just saying that if the previous claim (from VP's predictive CEA) was that: "Vida Plena...is 8 times more cost-effective than GiveDirectly", and GD has since been updated to 3-4x more cost-effective than it was compared to the time the predictive CEA was published, we should discount the 8x claim downwards somewhat (but not necessarily by 3-4x).
I think one could probably push back on whether 7.5 minutes of [extreme] pain is a reasonable estimate for a person who dies from malaria, but I think the bigger potential issue is still that the result of the BOTEC seems highly sensitive to the "excruciating pain is 100,000 times worse than fully healthy life is good" assumption - for both air asphyxiation and ice slurry, the time spent under excruciating pain make up more than 99.96% of the total equivalent loss of healthy life.[1]
I alluded to this on your post, but I think your results imply you would prefer to avert 10 shrimp days of excruciating pain (e.g. air asphyxiation / ice slurry) over saving 1 human life (51 DALYs).[2]
If I use your assumption and also value human excruciating pain as 100,000 times as bad as healthy life is good,[3] then this means you would prefer to save 10 shrimp days of excruciating pain (using your air asphyxiation figures) over 4.5 human hours of excruciating pain,[4] and your shrimp to human ratio is less than 50:1 - that is, you would rather avert 50 shrimp minutes of excruciating pain than 1 human minute of excruciating pain.
To be clear, this isn't a claim that one shouldn't donate to SWP, but just that if you do bite the bullet on those numbers above then I'd be keen to see some stronger justification beyond "my guess" for a BOTEC that leads to results that are so counterintuitive (like I'm kind of assuming that I've missed a step or OOMs in the maths here!), and is so highly sensitive to this assumption.[5]
Air asphyxiation: 1- (5.01 / 12,605.01) = 0.9996
Ice slurry: 1 - (0.24 / 604.57) = 0.9996
1770 * 7.5 = 13275 shrimp minutes
13275 / 60 / 24 = 9.21875 shrimp days
There are arguments in either direction, but that's probably not a super productive line of discussion.
51 * 365.25 * 24 * 60 = 26,823,960 human minutes
26,823,960 / 100,000 = 268.2396 human minutes of excruciating pain
268.2396 / 60 = 4.47 human hours of excruciating pain
13275 / 268.2396 = 49.49 (shrimp : human ratio)
Otherwise I could just copy your entire BOTEC, and change the bottom figure to 1000 instead of 100k, and change your topline results by 2 OOMs.
- Annoying pain is 10% as intense as fully healthy life.
- Hurtful pain is as intense as fully healthy life.
- Disabling pain is 10 times as intense as fully healthy life.
- Excruciating pain is 100 k times as intense as fully healthy life.
Thanks for the response, and likewise - hope you've been well! (Sorry I wasn't sure if it was you or someone else on the account).
I agree that it is pretty reasonable to stick with the same benchmark, but I think this means it should be communicated accordingly, as VP are sometimes referring to a benchmark and other times referring to the GD programme, while GW are sticking to the same benchmark for their cost-effectiveness analyses, but updating their estimates of GD programmes.[1]
E.g. the predictive CEA (pg 7) referenced says:
"This means that a $1000 donation to Vida Plena would produce 58 WELLBYs, which is 8 times more cost-effective than GiveDirectly (a charity that excels in delivering cash transfers - simply giving people money - and a gold standard in effective altruism)"[2]
I think people would reasonably misinterpret this to mean you are referring to the GD programme, rather than the GW benchmark.[3] Again I know this is a v recent update and so hadn't expected it to be already updated! But just flagging this as a potential source of confusion in the future.
Separately, I just thought I'd register interest in a more up-to-date predictive CEA that comes before your planned 2026 analysis, in part because there's decent reason to do so (though I'm not making the stronger claim that this is more important than other things on your plate!), 2026 is a while away, and because it's plausibly decision relevant for potential donors if they're not sure the extent to which HLI updates might be applicable to VP.
"Thus, we will be using our historic benchmark until we have thought it through. For now, you can think of our benchmark as “GiveWell’s pre-2024 estimate of the impacts of cash transfers in Kenya,” with GiveDirectly’s current programs in various countries coming in at 3 to 4 times as cost-effective as that benchmark."
The summary table on the same page also just says "GiveDirectly".
To VP's credit, I think "eight times more cost-effective than the benchmark of direct cash transfers" in this post would likely be interpreted correctly in a high context setting (but I also think reasonably might not be, and so may still be worth clarifying).
Thanks for writing this post!
I feel a little bad linking to a comment I wrote, but the thread is relevant to this post, so I'm sharing in case it's useful for other readers, though there's definitely a decent amount of overlap here.
TL; DR
I personally default to being highly skeptical of any mental health intervention that claims to have ~95% success rate + a PHQ-9 reduction of 12 points over 12 weeks, as this is is a clear outlier in treatments for depression. The effectiveness figures from StrongMinds are also based on studies that are non-randomised and poorly controlled. There are other questionable methodology issues, e.g. surrounding adjusting for social desirability bias. The topline figure of $170 per head for cost-effectiveness is also possibly an underestimate, because while ~48% of clients were treated through SM partners in 2021, and Q2 results (pg 2) suggest StrongMinds is on track for ~79% of clients treated through partners in 2022, the expenses and operating costs of partners responsible for these clients were not included in the methodology.
(This mainly came from a cursory review of StrongMinds documents, and not from examining HLI analyses, though I do think "we’re now in a position to confidently recommend StrongMinds as the most effective way we know of to help other people with your money" seems a little overconfident. This is also not a comment on the appropriateness of recommendations by GWWC / FP)
(commenting in personal capacity etc)
Edit:
Links to existing discussion on SM. Much of this ends up touching on discussions around HLI's methodology / analyses as opposed to the strength of evidence in support of StrongMinds, but including as this is ultimately relevant for the topline conclusion about StrongMinds (inclusion =/= endorsement etc):