M

MichaelPlant

8367 karmaJoined

Bio

I'm the Director of the Happier Lives Institute and  Postdoctoral Research Fellow at Oxford's Wellbeing Research Centre. I'm a philosopher by background and did my DPhil at Oxford, primarily under the supervision of Peter Singer and Hilary Greaves.  I've previously worked for an MP and failed to start a start-up

Comments
756

Hello LondonGal (sorry, I don't know your real name). I'm glad that, after your recent scepticism, you looked further into subjective wellbeing data and think it can be useful. You've written a lot and I won't respond to it in detail. 

I think the most important points to make are (1) there is a lot more research that you suggest and (2) it didn't just start around COVID. 

You are right that, if you search for "subjective wellbeing", not much comes up (I get 706 results on PubMed). However, that's because the trend among researchers to refer to "subjective wellbeing" rather than "subjective well-being", ie with a hyphen, is very recent (as, AFAIK, is unrelated to COVID). Searching for "subjective well-being" yields, by comparison, 4,806 results. 

If I expand the search to other keywords, namely "happiness" OR "life satisfaction" OR "subjective wellbeing" OR "subjective well-being", I get over 150,000 results on PubMed. This is displayed below. Note the results go back to 1838, but the research only really kicks off after 1980. 

I'm not an expert in academic databases, so I don't know how comprehensive PubMed is of all research, but I'm guessing it's a subset. FWIW, Ed Diener et al. in a 2018 article on subjective wellbeing states that there were "170,000 articles and books published on the topic in the past 15 years" although I haven't looked into his numbers. 

You might be interested in this article in the recent World Happiness Report which looks at various trends related to happiness, including academic interest, and find the fraction of articles on the topic has been trending up since the 80s: note the steady linear increase on the logarithmic y-axis.

Hence, as you suspected, many of the topics you raise here are reasonably quite well-trodden in the literature. 

If you're interested in looking further at the pros and cons of the WELLBY, the easiest thing for me to point you to is HLI's To WELLBY Or Not To WELLBY report and references therein. You may also find this reading list useful. 

In terms of the state of the literature, if you'll forgive further laziness and self-promotion, I'd suggest my EAG London talk. The short answer is that 'we' (ie happiness researchers) know quite a bit about the nature and measurement of wellbeing and its causes and correlates, but relatively little about what the best ways are to increase it; work on WELLBY cost-effectiveness is barely older than COVID. 

Hello Linch. We're reluctant to recommend organisations that we haven't been able to vet ourselves but are planning to vet some new mental health and non-mental health organisations in time for Giving Season 2023. The details are in our Research Agenda. For mental health, we say

We expect to examine Friendship Bench, Sangath, and CorStone unless we find something more promising.


On how we chose StrongMinds, you've already found our selection process. Looking back at the document, I see that we don't get into the details, but it wasn't just procedural. We hadn't done a deep dive analysis at the point - the point of the search process was to work out what we should look at in more depth - but our prior was that StrongMinds would come out at or close to the top anyway. To explain, it was delivering the intervention we thought would do most good per person (therapy for depression), doing this cheaply (via lay-delivered interpersonal group therapy) and it seems to be a well-run organisation. I thought Friendship Bench might beat it (Friendship Bench had a volunteer model and so plausibly much lower costs but also lower efficacy) but they didn't offer us their data at the time, something they've since done. I don't think I knew about Sangath or Corstone back then. 

I think I would advise donors to wait until the end of this year. However, my money would be on Friendship Bench being the best MH org that isn't StrongMinds and I wouldn't rule out it being more cost-effective.

This was really helpful, thanks! I'll discuss it with the team.

[I don’t plan make any (major) comments on this thread after today. It’s been time-and-energy intensive and I plan to move back to other priorities]

Hello Jason,

I really appreciated this comment: the analysis was thoughtful and the suggestions constructive. Indeed, it was a lightbulb moment.  I agree that some people do have us on epistemic probation, in the sense they think it’s inappropriate to grant the principle of charity, and should instead look for mistakes (and conclude incompetence or motivated reasoning if they find them).

I would disagree that HLI should be on epistemic probation, but I am, of course, at risk of bias here, and I’m not sure I can defend our work without coming off as counter-productively defensive! That said, I want to make some comments that may help others understand what’s going on so they can form their own view, then set out our mistakes and what we plan to do next.

Context

I suspect that some people have had HLI on epistemic probation since we started - for perhaps understandable reasons. These are: 

  1. We are advancing a new methodology, the happiness/SWB/WELLBY approach. Although there are decades of work in social science on this and it’s now used by the UK government, this was new to most EAs and they could ask, "if it's so good, why aren't we already doing it?" Of course, new ideas have to start sometime.
  2. HLI is a second-generation EA org that is setting out to publicly re-assess some conclusions of an existing (understandably!) well-beloved first-generation org, GiveWell. I can't think of another case like this; usually, EA orgs do non-overlapping work. Some people have welcomed us offering a different perspective, others have really not liked it; we’ve clearly ruffled some feathers.
  3. As a result of 1 and 2, there is something of a status quo effect and scepticism that wouldn’t be the case if we were offering recommendations in a new area for the first time. To illustrate, suppose you know nothing about global health and wellbeing and someone tells you they’ve done lots of research based on happiness measures and they’ve found cash transfers are good, treating depression is about 7x as good as cash, deworming has no clear long-run effect, and life-saving bednets are 1-8x cash depending on difficult moral assumptions. I expect most people would say “yeah, that seems reasonable” rather than “why are engaged in motivated reasoning?”. 

Our mistakes (so far)

The discussion in this thread has been a bit vague about what mistakes HLI has made that have led to suspicion. I want to set out what, from my perspective, those are. I reserve the right to add things to this list! We'll probably put a version of this on our website. 

1. Not modelling spillovers in our cash vs psychotherapy meta-analyses.

This was the first substantive empirical criticism we received. We had noted in the original report that not including spillovers was a limitation in the analysis, but we hadn’t explicitly modelled them. This was for a couple of reasons. We hadn’t seen any other EA org empirically model spillovers, so it seemed an non-standard thing to do, and the data were low-quality anyway, so we hadn’t thought much about including them. We were surprised when some claimed this was a serious (possibly deliberate) omission. 

That said, we took the objection very seriously and reallocated several months of staff time in early 2022 from other topics to produce the best spillovers analysis we could on the available data, which we then shared with others. In the end, it only somewhat reduced the result (therapy went from 12x cash to 9x).

2. We were too confident and clumsy in our 2022 Giving Season post.

At that point, we incorporated nearly all the available data into our cash and psychotherapy meta-analyses, accounted for spillovers, plus looked at deworming (for which long-term effects on wellbeing are non-significant) and life-extending vs life-saving interventions (where psychotherapy seemed better under almost all assumptions). So we felt proud of our work and quite confident.

In retrospect, as I've alluded to before, we were overconfident, our language and execution were clumsy, and this really annoyed some people. I'm sorry about this and I hope people can forgive us. We have since spent some time internally thinking about how to communicate our confidence in our conclusions.

3. Not communicating better how we’d done our meta-analysis of psychotherapy, including that we hadn’t taken StrongMinds’ own studies at face value. 

SimonM’s post has been mentioned a few times in this thread. As I mentioned in point 3 here, SimonM criticised the recommendation of StrongMinds based on concerns about StrongMinds’ own study, not our analysis. He said he didn’t engage with our analysis because he was ‘confused’ about methodology but that, in any case “key thing about HLI methodology is that [it] follows the same structure as the Founders Pledge analysis and so all the problems I mention above regarding data apply just as much to them as FP”. However, our evaluation didn’t have the problems he was referring to because of how we’d done the meta-analysis.

In retrospect, it seems the fact we’d done a meta-analysis, and not put much weight on StrongMinds’ own study, wasn’t something people knew, and we should have communicated that much more prominently; it was buried in some super long posts. We need to own our inadequate comms there. It was tough to learn he and some other members of EA have been thinking of us with such suspicion. Psychologically, the team took this very hard. 

4. We made some errors in the spillovers analysis (as pointed out by James Snowden).

The main error here was that, as my colleague Joel conceded (“I blundered”) he coded some data the wrong way and this reduced the result from 9x to 7.5x cash transfers. This is embarrassing but not, I think, sinister by itself. These things happen, they’re awkward, but not well explained by motivated reasoning: coding errors are checkable and, in any case, the result is unchanged with the correction (see my comment here too)

I recognise that some will think this a catalogue of errors best explained by a corrupting agenda; the reader must make up their own mind. Two of the four are analysis errors of the sort that routinely appear when researchers review each other's work. Two are errors in communication, either about being overconfident, or not communicating enough. 

Next steps:

Jason suggests those on epistemic probation should provide a credible exit plan. Leaving aside whether we are, or should be, on epistemic probation, I am happy to set out what we plan to do next. For our research regarding reevaluating psychotherapy, we had already set this out in our new research agenda, at Section 2.1, which we published at the same time as this post. We are still committed to digging into the details of this analysis that have been brought up.

About bounties: I like this idea and wish we could implement it, but in light of our funding position, I don’t think we’ll be able to do so in the near-term.

In addition, we’ll consider adding something like an ‘Our mistakes’ page to our website to chronicle our blunders. At the least, we’ll add a version history to our cost-effectiveness analysis so people can see how the numbers have changed over time and why. 

I am open to - indeed, I welcome - further constructive suggestions about what work people would like us to do to change their minds and/or reassure them. I do ask that these are realistic: as noted, we are a small, funding-and-capacity-constrained team with a substantial research agenda.  We therefore might not be able to take all suggestions on board. 

Hello Gregory. With apologies, I’m going to pre-commit both to making this my last reply to you on this post. This thread has been very costly in terms of my time and mental health, and your points below are, as far as I can tell, largely restatements of your earlier ones. As briefly as I can, and point by point again.

1. 

A casual reader looking at your original comment might mistakenly conclude that we only used StrongMinds own study, and no other data, for our evaluation. Our point was that SM’s own work has relatively little weight, and we rely on many other sources. At this point, your argument seems rather ‘motte-and-bailey’. I would agree with you that there are different ways to do a meta-analysis (your point 3), and we plan to publish our new psychotherapy meta-analysis in due course so that it can be reviewed.

2. 

Here, you are restating your prior suggestions that HLI should be taken in bad faith. Your claim is that HLI is good at spotting errors in others’ work, but not its own. But there is an obvious explanation about 'survivorship' effects. If you spot errors in your own research, you strip them out. Hence, by the time you publish, you’ve found all the ones you’re going to find. This is why peer review is important: external reviewers will spot the errors that authors have missed themselves. Hence, there’s nothing odd about having errors in your own work but also finding them in others. This is the normal stuff of academia!

3.

I’m afraid I don’t understand your complaint. I think your point is that “any way you slice the meta-analysis, psychotherapy looks more cost-effective than cash transfers” but then you conclude this shows the meta-analysis must be wrong, rather than it’s sensible to conclude psychotherapy is better. You’re right that you would have to deflate all the effect sizes by a large proportion to reverse the result. This should give you confidence in psychotherapy being better! It’s worth pointing out that if psychotherapy is about $150pp, but cash transfers cost about $1100pp ($1000 transfer + delivery costs), therapy will be more cost-effective per intervention unless its per-intervention effect is much smaller

The explanation behind finding a new charity on our first go is not complicated or sinister. In earlier work, including my PhD, I had suggested that, on a SWB analysis, mental health was likely to be relatively neglected compared to status quo prioritising methods. I explained this in terms of the existing psychological literature on affective forecasting errors: we’re not very good at imagining internal suffering, we probably overstate the badness of material due to focusing illusions, and our forecasts don’t account for hedonic adaptation (which doesn’t occur to mental health). So the simple explanation is that we were ‘digging’ where we thought we were mostly likely to find ‘altruistic gold’, which seems sensible given limited resources. 

4.

As much as I enjoyed your football analogies, here also you’re restating, rather than further substantiating, your earlier accusations. You seem to conclude from the fact you found some problems with HLI’s analysis that we should conclude this means HLI, but only HLI, should be distrusted, and retain our confidence in all the other charity evaluators. This seems unwarranted. Why not conclude you would find mistakes elsewhere too? I am reminded of the expression, “if you knew how the sausage was made, you wouldn’t want to eat the sausage”. What I think is true is that HLI is a second-generation charity evaluator, we are aiming to be extremely transparent, and we are proposing novel priorities. As a result, I think we have come in for a far higher level of public scrutiny than others have, so more of our errors have been found, but I don’t know that we have made more and worse errors. Quite possibly, where errors have been noticed in others’ work, they have been quietly and privately identified, and corrected with less fanfare.

Hello Jason. FWIW, I've drafted a reply to your other comment and I'm getting it checked internally before I post it.

On this comment about you not liking that we hadn't updated our website to include the new numbers: we all agree with you! It's a reasonable complaint. The explanation is fairly boring: we have been working on a new charity recommendations page for the website, at which point we were going to update the numbers at add a note, so we could do it all in one go. (We still plan to do a bigger reanalysis later this year.) However, that has gone slower than expected and hadn't happened yet. Because of your comment, we'll add a 'hot fix' update in the next week, and hopefully have the new charity recommendations page live in a couple of weeks.

I think we'd have moved faster on this if it had substantially changed the results. On our numbers, StrongMinds is still the best life-improving intervention (it's several times better than cash and we're not confident deworming has a longterm effect). You're right it would slightly change the crossover point for choosing between life-saving and life-improving interventions, but we've got the impression that donors weren't making much use of our analysis anyway; even if they were, it's a pretty small difference, and well within the margin of uncertainty. 

Hello Jack (again!),

This is because plausible person-affecting views will still find it important to improve the lives of future people who will necessarily exist.

I agree with this. But the challenge from the Non-Identity problem is that there are few, if any, necessarily existing future individuals: what we do causes different people to come into existence. This raises a challenge to longtermism: how can we make the future go better if we can't make it go better for anyone in particular? If an outcome is not better for anyone, how can it be better? In the discourse, philosophers tend to accept that it is the implication of (some) person-affecting views that we can't (really) make the future go better for anyone, but take this implication as a decisive reason to reject those views. My suspicion is that philosophers have been too quick to dismiss such person-affecting views and they merit another look. 

Hello Jack. A quick reply: I'm not sure how well the arguments for improving global being a sensible longterm priority will stack up. I suspect they won't, on closer inspection, but it seems worth investigating at some point.

Hello Matt and thanks for your overall vote of confidence, including your comments below to Nathan. 

Could you expand on what you said here?

I may also have been a little sus early (sorry Michael) on but HLI's work has been extremely valuable

I'm curious to know why you were originally suspicious and what changed your mind. Sorry if you've already stated that below. 

Load more