I am a researcher at the Happier Lives Institute. In my work, I assess the cost-effectiveness of interventions in terms of subjective wellbeing.
- Excruciating pain is 1 k times as bad as disabling pain[3].
- Disabling pain is 100 times as bad as hurtful pain.
- Hurtful pain is 10 times as bad as annoying pain.
Call me a linearity-pilled likert-maxer, but this seems a bit wild. I’ve previously read the articles you linked to, and think it’s plausible that intense suffering can be way worse than we we’d be inclined to imagine, but don’t think it’s obvious by any means or that this necessarily implies the profound possibilities of suffering indicated here.
After some introspection, which seems near state of the art on this question, I’d guess a range of hedonic experience of -1000 to 100 where:
If I take your scale seriously (linearized below) and normalize hurtful pain to -5 (which would be more compatible with your assumptions), then I have to accept that my worst experienced suffering (which certainly felt like I was maxing out my psychological capacity for distress), was 0.12% of what’s possible. That strikes me as a bit odd. But hey, intuitions differ.
[Not answering on behalf of HLI, but I am an HLI employee]
Hi Michal,
We are interested in exploring more systematic solutions to aligning institutions with wellbeing. This topic regularly arises during strategic conversations.
Our aim is to eventually influence policy, for many of the reasons you mention. But we’re currently focusing on research and philanthropy. This is because there’s still a lot we need to learn about how to measure and best improve wellbeing. But before we attempt to influence how large amounts of resources are spent, I think we should be confident that our advice is sound.
The institution that I’m most interested in changing is academia. I think:
Following up on that last point, Folk and Dunn (2023) reviewed the power and pre-registration status of research on the 5 most popular recommendations in media for individuals to increase their happiness. The, results, captured in the figure below, are humbling.
That said, there is an organization that attempts to popularize and disseminate finding from wellbeing literature: https://actionforhappiness.org. We haven’t evaluated them yet, but they’re on our long list. I expect they’ll be challenging to evaluate.
Hi again Jason,
When we said "Excluding outliers is thought sensible practice here; two related meta-analyses, Cuijpers et al., 2020c; Tong et al., 2023, used a similar approach" -- I can see that what we meant by "similar approach" was unclear. We meant that, conditional on removing outliers, they identify a similar or greater range of effect sizes as outliers as we do.
This was primarily meant to address the question raised by Gregory about whether to include outliers: “The cut data by and large doesn't look visually 'outlying' to me.”
To rephrase, I think that Cuijpers et al. and Tong et al. would agree that the data we cut looks outlying. Obviously, this is a milder claim than our comment could be interpreted as making.
Turning to wider implications of these meta-analyses, As you rightly point out, they don’t have a “preferred specification” and are mostly presenting the options for doing the analysis. They present analyses with and without outlier removal in their main analysis, and they adjust for publication bias without outliers removed (which is not what we do). The first analytic choice doesn’t clearly support including or excluding outliers, and the second – if it supports any option, favors Greg's proposed approach of correcting for publication bias without outliers removed.
I think one takeaway is that we should consider surveying the literature and some experts in the field, in a non-leading way, about what choices they’d make if they didn’t have “the luxury of not having to reach a conclusion”.
I think it seems plausible to give some weight to analyses with and without excluding outliers – if we are able find a reasonable way to treat the 2 out of 7 publication bias correction methods that produce the results suggesting that the effect of psychotherapy is in fact sizably negative. We'll look into this more before our next update.
Cutting the outliers here was part of our first pass attempt at minimising the influence of dubious effects, which we'll follow up with a Risk of Bias analysis in the next version. Our working assumption was that effects greater than ~ 2 standard deviations are suspect on theoretical grounds (that is, if they behave anything like SDs in an normal distribution), and seemed more likely to be the result of some error-generating process (e.g. data-entry error, bias) than a genuine effect.
We'll look into this more in our next pass, but for this version we felt outlier removal was the most sensible choice.
Hi Jason,
“Would it have been better to start with a stipulated prior based on evidence of short-course general-purpose[1] psychotherapy's effect size generally, update that prior based on the LMIC data, and then update that on charity-specific data?”
1. To your first point, I think adding another layer of priors is a plausible way to do things – but given the effects of psychotherapy in general appear to be similar to the estimates we come up with[1] – it’s not clear how much this would change our estimates.
There are probably two issues with using HIC RCTs as a prior. First, incentives that could bias results probably differ across countries. I’m not sure how this would pan out. Second, in HICs, the control group (“treatment as usual”) is probably a lot better off. In a HIC RCT, there’s not much you can do to stop someone in the control group of a psychotherapy trial to go get prescribed antidepressants. However, the standard of care in LMICs is much lower (antidepressants typically aren’t an option), so we shouldn’t be terribly surprised if control groups appear to do worse (and the treatment effect is thus larger).
“To my not-very-well-trained eyes, one hint to me that there's an issue with application of Bayesian analysis here is the failure of the LMIC effect-size model to come anywhere close to predicting the effect size suggested by the SM-specific evidence.”
2. To your second point, does our model predict charity specific effects?
In general, I think it’s a fair test of a model to say it should do a reasonable job at predicting new observations. We can’t yet discuss the forthcoming StrongMinds RCT – we will know how well our model works at predicting that RCT when it’s released, but for the Friendship Bench (FB) situation, it is true that we predict a considerably lower effect for FB than the FB-specific evidence would suggest. But this is in part because we’re using a combination of charity specific evidence to inform our prior and the data. Let me explain.
We have two sources of charity specific evidence. First, we have the RCTs, which are based on a charity programme but not as it’s deployed at scale. Second, we have monitoring and evaluation data, which can show how well the charity intervention is implemented in the real world. We don’t have a psychotherapy charity at present that has RCT evidence of the programme as it's deployed in the real world. This matters because I think placing a very high weight on the charity-specific evidence would require that it has a high ecological validity. While the ecological validity of these RCTs is obviously higher than the average study, we still think it’s limited. I’ll explain our concern with FB.
For Friendship Bench, the most recent RCT (Haas et al. 2023, n = 516) reports an attendance rate of around 90% to psychotherapy sessions, but the Friendship Bench M&E data reports an attendance rate more like 30%. We discuss this in Section 8 of the report.
So for the Friendship Bench case we have a couple reasonable quality RCTs for Friendship Bench, but it seems like, based on the M&E data, that something is wrong with the implementation. This evidence of lower implementation quality should be adjusted for, which we do. But we include this adjustment in the prior. So we’re injecting charity specific evidence into both the prior and the data. Note that this is part of the reason why we don’t think it’s wild to place a decent amount of weight on the prior. This is something we should probably clean up in a future version.
We can’t discuss the details of the Baird et al. RCT until it’s published, but we think there may be an analogous situation to Friendship Bench where the RCT and M&E data tell conflicting stories about implementation quality.
This is all to say, judging how well our predictions fair when predicting the charity specific effects isn’t clearly straightforward, since we are trying to predict the effects of the charity as it is actually implemented (something we don’t directly observe), not simply the effects from an RCT.
If we try and predict the RCT effects for Friendship Bench (which have much higher attendance than the "real" programme), then the gap between the predicted RCT effects and actual RCT effects is much smaller, but still suggests that we can’t completely explain why the Friendship Bench RCTs find their large effects.
So, we think the error in our prediction isn't quite as bad as it seems if we're predicting the RCTs, and stems in large part from the fact that we are actually predicting the charity implementation.
Cuijpers et al. 2023 finds an effect of psychotherapy of 0.49 SDs for studies with low RoB in low, middle, and high income countries (comparisons = 218#), and Tong et al. 2023 find an effect of 0.69 SDs for studies with low RoB in non-western countries (primarily low and middle income; comparisons = 36). Our estimate of the initial effect is 0.70 SDs (before publication bias adjustments). The results tend to be lower (between 0.27 and 0.57, or 0.42 and 0.60) SDs when the authors of the meta-analyses correct for publication bias. In both meta-analyses (Tong et al. and Cuijpers et al.) the authors present the effects after using three publication bias corrected methods: trim-and-fill (0.6; 0.38 SDs), a limit meta-analysis (0.42; 0.28 SDs), and using a selection model (0.49; 0.57 SDs). If we averaged their publication bias corrected results (which they did without removing outliers beforehand) the estimated effect of psychotherapy would be 0.5 SDs and 0.41 for the two meta-analyses. Our estimate of the initial effect (which is most comparable to these meta-analyses), after removing outliers is 0.70 SDs, and our publication bias correction is 36%, implying that we estimate our initial effect to be 0.46 SDs. You can play around with the data they use on the metapsy website.
Hi Victor,
Our updated operationalization of psychotherapy we use in our new report (page 12) is
"For the purposes of this review, we defined psychotherapy as an intervention with a structured, face-to-face talk format, grounded in an accepted and plausible psychological theory, and delivered by someone with some level of training. We excluded interventions where psychotherapy was one of several components in a programme."
So basically this is "psychotherapy delivered to groups or individuals by anyone with some amount of training".
Does that clarify things?
Also, you should be able to use our new model to calculate the WELLBYs of 1 to 1 more traditional psychotherapy since we include studies with 1 to 1 in our model. Friendship Bench, for instance, uses that model (albeit with lay mental health workers with relatively brief training). Note that in this update our findings about group versus individual therapy has reversed and we now find 1 to 1 is more effective than group delivery (page 33). This is a bit of a puzzle since it disagrees somewhat with the broader literature, but we haven't had time to look into this further.
They only include costs to the legal entity of StrongMinds. To my understanding, this includes a relatively generous stipend they provide to the community health workers and teachers that are "volunteering" to provide StrongMinds or grants StrongMinds makes to NGOs to support their delivery of StrongMinds programs.
Note that 61% of their partnership treatments are through these volunteer+ arrangements with community health workers and teachers. I'm not too worried about this since I'm pretty sure there aren't meaningful additional costs to consider. these partnership treatments appear to be based on individual CHWs and teachers opting in. I also don't think that the delivery of psychotherapy is meaningfully leading them to do less of their core health or educational work.
I'd be more concerned if these treatments were happening because a higher authority (say school administrators) was saying "Instead of teaching, you'll be delivering therapy". The costs to deliver therapy could then reasonably seen to include the teacher's time and the decrease in teaching they'd do.
But what about the remaining 39% of partnerships (representing 24% of total treatments)? These are through NGOs. I think that 40% of these are delivered because StrongMinds is giving grants to these NGOs to deliver therapy in areas that StrongMinds can't reach for various reasons. The other 60% of NGO cases appears to be instances where the NGO is paying StrongMinds to train it to deliver psychotherapy. The case for causally attributing these cases of treatment to StrongMinds seems more dubious here, and I haven't gotten all the information I'd like, so to be conservative I assumed that none of these cases StrongMinds claims as its are attributable to it. This increases the costs by around 14%[1] because it's reducing the total number treated by around 14%.
Some preemptive hedging: I think my approach so far is reasonable, but my world wouldn't be rocked if I was later convinced this isn't quite the way to think about incorporating costs in a situation with more decentralized delivery and more unclear causal attribution for treatment.
But 1.14 * 59 is 67 not 63! Indeed. The cost we report is lower than $67 because we include an offsetting 7.4% discount to the costs to harmonize the cost figures of StrongMinds (which are more stringent about who counts as treated -- more than half of sessions completed are required) with Friendship Bench (who count anyone as treated as receiving at least 1 session). So 59 * (1 - 0.074) * 1.14 is $63. See page 69 of the report for the section where we discuss this.
Hi Nick,
Good question. I haven't dug into this in depth, so consider this primarily my understanding of the story. I haven't gone through an itemized breakdown of StrongMinds costs on a year by year basis to investigate this further.
It is a big drop from our previous costs. But I originally did the research in Spring 2021, when 2020 was the last full year. That was a year with unusually high costs. I didn't use those costs because I assumed this was mostly a pandemic related aberration, but I wasn't sure how long they'd keep the more expensive practices like teletherapy they started during COVID (programmes can be sticky). But they paused their expensive teletherapy programme this year because of cost concerns (p. 5).
So $63 is a big change from $170, but a smaller change from $109 -- their pre-COVID costs.
What else accounts for the drop though? I think "scale" seems like a plausible explanation. The first part of the story is fixed / overhead costs being spread over a larger number of people treated with variable (per person) costs remaining stable. StrongMinds spends at least $1 million on overhead costs (office, salaries, etc). The more people are treated, the lower the per person costs (all else equal). The second part of the story is that I think it's plausible that variable costs (i.e., training and supporting the person delivering the therapy) are also decreasing. They've also shifted towards moving away from staff-centric delivery model to using more volunteers (e.g., community health workers), which likely depresses costs somewhat further. We discuss their scaling strategy and the complexities it introduces into our analysis a bit more around page 70 of the report.
Below I've attached StrongMinds most recent reporting about their number treated and cost per person treated, which gives a decent overall picture for how the costs and the number treated have changed over time.
Sounds about right. 10 minutes means bad life. 5 minutes means still life worth living.
Good to know. It mostly makes a difference for humans and cows coming from ~0 to ~5% of disability in your model it seems?
And I appreciate you continuing to bang the drum for animal welfare stuff. It's made me think about it more.