Mo Putera

Research & quantitative modelling @ ARMoR

Bio

Participation
4

CE/AIM Research Training Program graduate and research contractor at ARMoR under the Global Impact Placements program, working on research & quantitative modeling to support policy advocacy for market-shaping tools to help combat AMR, and also exploring similar "decision guidance" roles e.g. applied prioritization research. Previously supported by a FTX Future Fund regrant and later Open Philanthropy's affected grantees program. Before that I spent 6 years doing data analytics, business intelligence and knowledge + project management in various industries (airlines, e-commerce) and departments (commercial, marketing), after majoring in physics at UCLA. I've also initiated some local priorities research efforts, e.g. a charity evaluation initiative with the moonshot aim of reorienting Malaysia's giving landscape towards effectiveness, albeit with mixed results. 

I first learned about effective altruism circa 2014 via A Modest Proposal, a polemic on using dead children as units of currency to force readers to grapple with the opportunity costs of subpar resource allocation under triage. I have never stopped thinking about it since, although my relationship to it has changed quite a bit; I related to Tyler's personal story (which unsurprisingly also references A Modest Proposal as a life-changing polemic):

I thought my own story might be more relatable for friends with a history of devotion – unusual people who’ve found themselves dedicating their lives to a particular moral vision, whether it was (or is) Buddhism, Christianity, social justice, or climate activism. When these visions gobble up all other meaning in the life of their devotees, well, that sucks. I go through my own history of devotion to effective altruism. It’s the story of [wanting to help] turning into [needing to help] turning into [living to help] turning into [wanting to die] turning into [wanting to help again, because helping is part of a rich life].

Comments
130

Topic contributions
3

Epistemic status: public attempt at self-deconfusion & not just stopping at knee-jerk skepticism

The recently published Cost-effectiveness of interventions for HIV/AIDS, malaria, syphilis, and tuberculosis in 128 countries: a meta-regression analysis (so recent it's listed as being published next month), in my understanding, aims to fill country-specific gaps in CEAs for all interventions in all countries for HIV/AIDS, malaria, syphilis, and tuberculosis, to help national decision-makers allocate resources effectively – to a first approximation I think of it as "like the DCP3 but at country granularity and for Global Fund-focused programs". They do this by predicting ICERs, IQRs, and 95% UIs in US$/DALY using the meta-regression parameters obtained from analysing ICERs published for these interventions (more here). 

AFAICT their methodology and execution seem superb, so I was keen to see their results: 

Figure thumbnail gr3a

Antenatal syphilis screening ranks as the lowest median ICER in 81 (63%) of 128 countries, with median ICERs ranging from $3 (IQR 2–4) per DALY averted in Equatorial Guinea to $3473 (2244–5222) in Ukraine.

At risk of being overly skeptical: $3 per DALY averted is >30x better than Open Phil's 1,000x bar of $100 per DALY which is roughly around GW top charity level which OP have said are hard to beat, especially for a direct intervention like antenatal syphilis screening. It makes me wonder how much credence to put in the study's findings for actual resource allocation decisions (esp. Figure 4 ranking top interventions at country granularity). Also:

  • Specifically re: antenatal syphilis screening, CE/AIM's report on screening + treating antenatal syphilis estimates $81 per DALY; I'm hard-pressed to believe that removing treatment improves cost-eff >1 OOM  
  • I'm reminded of the time GW found 5 separate spreadsheet errors in a DCP2 estimate of soil-transmitted-helminth (STH) treatment that together misleadingly 'improved' its cost-effectiveness ~100-fold from $326.43 per DALY (correct output) to just $3.41 (wrong, and coincidentally in the ballpark of the estimate above that triggered my skepticism) 

So how should I think about and use their findings given what seems like reasonable grounds for skepticism, if I'm primarily interested in helping decision-makers help people better? Scattered thoughts to defend the study / push back on my nitpicking above:

  • even if imperfect – and I'm not confident in my skepticism above – they clearly improve substantially upon the previous state of affairs (CEA gaps everywhere at country-disease-intervention level granularity; expert opinion not lending itself to country-specific predictions; case-by-case methods often being unsuccessful)
  • their recommendations seem reasonably hedged, not naively maximalist: they include 95% uncertainty intervals; they clearly say "cost-effectiveness... should not be the only criterion... [consider also] enhancing equity and providing financial risk protection
  • even a naively maximalist recommendation ("first fund lowest-ICER intervention, then 2nd-lowest, ... until funds run out") doesn't seem unreasonable in this context – essentially countries would end up funding more antenatal syphilis screening, intermittent preventive treatment of malaria in pregnant women and infants, and chemotherapy for drug-susceptible TB (just from eyeballing Figure 4)
  • I interpret what they're trying to do as not so much "here are the ICER league tables, use them", but shifting decision-makers' approach to resource allocation from needing a single threshold for all healthcare funding decisions to (quoting them) "ICERs ranked in country-specific league tables", and in the long run this perspective shift seems useful to "bake into" decision-making processes, even if the specific figures in this specific study aren't necessarily the most accurate and shouldn't be taken at face value

That said, I do wonder if the authors could have done a bit better, like 

  • cautioning against naively taking the best cost-eff estimates at face value, instead of suggesting "Funds could be first spent on the intervention that has the lowest ICER. Following that, other interventions could be funded in order of their ICER rankings, as long as there are available funds
  • spot-checking some of (not all) the top cost-eff ICERs that went into their meta-regression analysis to get a sense of their credibility, especially those which feed into their main recommendations, like GW did above with the DCP2 estimate for STH treatment 
  • extracting qualitative proxies for decision-maker guidance from an analysis of the main drivers behind the substantial ranking differences in intervention ICERs across economic and epidemiological contexts (eg "we should expect antenatal syphilis screening to be substantially less cost-effective in our context due to factors XYZ, let's look at other interventions instead" – what would a short useful list of XYZ look like?), instead of just saying "we found the rankings differ substantially"

I hadn't, thanks for the pointer Pablo.

Curious what people think of Gwern Branwen's take that our moral circle has historically narrowed as well, not just expanded (so contra Singer), so we should probably just call it a shifting circle. His summary:

The “expanding circle” historical thesis ignores all instances in which modern ethics narrowed the set of beings to be morally regarded, often backing its exclusion by asserting their non-existence, and thus assumes its conclusion: where the circle is expanded, it’s highlighted as moral ‘progress’, and where it is narrowed, what is outside is simply defined away. 

When one compares modern with ancient society, the religious differences are striking: almost every single supernatural entity (place, personage, or force) has been excluded from the circle of moral concern, where they used to be huge parts of the circle and one could almost say the entire circle. Further examples include estates, houses, fetuses, prisoners, and graves.

(I admittedly don't find his examples all that persuasive, probably because I'm already biased to only consider beings that can feel pleasure and suffering.)

What's the "so what"? Gwern:

One of the most difficult aspects of any theory of moral progress is explaining why moral progress happens when it does, in such apparently random non-linear jumps. (Historical economics has a similar problem with the Industrial Revolution & Great Divergence.) These jumps do not seem to correspond to simply how many philosophers are thinking about ethics. 

As we have already seen, the straightforward picture of ever more inclusive ethics relies on cherry-picking if it covers more than, say, the past 5 centuries; and if we are honest enough to say that moral progress isn’t clear before then, we face the new question of explaining why things changed then and not at any point previous in the 2500 years of Western philosophy, which included many great figures who worked hard on moral philosophy such as Plato or Aristotle. 

It is also troubling how much morality & religion seems to be correlated with biological factors. Even if we do not go as far as Julian Jaynes’s9 theories of gods as auditory hallucinations, there are still many curious correlations floating around.

Nicolaj correct me if I'm wrong – I think it's derived here in the OP:

(Quantitatively it would be captured by  when combined with the improving circumstances component. That comes from solving the last equation in Rethink Priorities’ 2023 report for  given  and —i.e., assuming that the compounding non-monetary benefits factor also reflects diminishing marginal utility from income doublings. As a result I'm assuming the discount rate reflects  for the remainder of the post.)

That last equation on pg 48 is 𝑟_𝐺𝑖𝑣𝑒𝑊𝑒𝑙𝑙 = (1 + δ)(1 + 𝑔)^(η−1) − 1. δ is the pure time preference rate, for which GiveWell's choice is δ = 0%; pg 30 in the RP report above summarizes the reasoning behind this choice. 

Maybe 

Other scattered remarks

Perhaps the virtue ethicist part of you may feel partly assuaged by GiveDirectly's blog post about the project? I'm thinking in particular of these sections (warning - long quotes):

 GiveDirectly confirmed recipients and communities want to be featured, as always:  

  • For all media projects, we first consult with village leadership to confirm their interest and consent for participating. For this video, we also met with local and national government officials to confirm if they were supportive of such a large spotlight. 
  • Journalists and content creators always follow this guidance when visiting GiveDirectly programs. Profiled recipients first give informed consent before sharing their story. You can read our consent forms here→

Beast Philanthropy centered the local culture:

  • They regularly solicited input from our local staff about whether approaches and portrayals would be received well by the community and had us give notes on the video edit. 
  • They focused on English-speakers so recipients could share more of their story in their own voice. 
  • They worked to capture the cultural specificity of the community, forgoing stock music for natural sounds→

After filming, GiveDirectly’s safeguarding team interviewed 9 of the filmed recipients. You can read their feedback here – some highlights:

Recipients enjoyed being on camera.

  • “The way they came and interacted with me and my family, that’s what I liked most. I felt in place and free with them.”
  • “I was very happy and I welcomed them. I showed them my land agreement together with the land, iron sheets (for my new roof) and some household materials.” 

Their motivations for participating varied.

  • “I did accept to participate because of the challenges and poverty that my community members are facing. I needed to represent their views.”
  • “I needed to tell how happy I felt and also to show the rest of the community members that when given something small or large you can always use it in a way that can help raise your standard of living.”

Two gave us actionable feedback for how we can improve next time.

  • “I was relaxed and very happy, though my husband got anxious about the number of GiveDirectly staff who visited us.”
  • “I felt good about it, though I feel I should also be shown the photos and videos to watch.”

Later this month, we’ll screen the video for the featured community dubbed into Nga’Karimojong (their language), followed by a focus group discussion, then update this blog with their thoughts on the final video.

This was to me a surprising amount of beneficiary thoughtfulness for a MrBeast video (admittedly I don't watch his content often), albeit in line with my expectations for GiveDirectly.

Upvoted :) 

I agree with Ben Millwood's comment that I don't think this would change many decisions in practice. 

To add another point, input parameter uncertainty is larger than you probably think, even for direct-delivery GHD charities (let alone policy or meta orgs). The post Quantifying Uncertainty in GiveWell Cost-Effectiveness Analyses visualises this point particularly vividly; you can see how a 10% change doesn't really change prioritisation much:

InterventionGiveWellOur Mean95% CIDiference
Against Malaria Foundation0.03750.03840.0234 - 0.0616+2.4%
GiveDirectly0.003350.003590.00167 - 0.00682+7%
Helen Keller International0.05410.06110.0465 - 0.0819+12.8%
Malaria Consortium0.0310.03180.0196 - 0.0452+2.52%
New Incentives0.04580.05210.0139 - 0.11713.8%

(Look at how large those 95% CIs are vs a 10% change.)

I think a useful way to go about this is to ask, what would have to change to alter the decisions (e.g. top-recommended charities, intervention ideas turned into incubated charities, etc)? This gets you into uncertainty analysis, to which I'd point you to froolow's Methods for improving uncertainty analysis in EA cost-effectiveness models.

The ARC Prize website takes this definitional stance on AGI:

Consensus but wrong:

AGI is a system that can automate the majority of economically valuable work.

Correct:

AGI is a system that can efficiently acquire new skills and solve open-ended problems.

Something like the former definition, central to reports like Tom Davidson's CCF-based takeoff speeds for Open Phil, basically drops out of (the first half of the reasoning behind) the big-picture view summarized in Holden Karnofsky's most important century series: to quote him, the long-run future would be radically unfamiliar and could come much faster than we think, simply because standard economic growth models imply that any technology that could fully automate innovation would cause an "economic singularity"; one such technology could be what Holden calls PASTA ("Process for Automating Scientific and Technological Advancement"). In What kind of AI? he elaborates (emphasis mine)

I mean PASTA to refer to either a single system or a collection of systems that can collectively do this sort of automation. ...

By talking about PASTA, I'm partly trying to get rid of some unnecessary baggage in the debate over "artificial general intelligence." I don't think we need artificial general intelligence in order for this century to be the most important in history. Something narrower - as PASTA might be - would be plenty for that. ...

I don't particularly expect all of this to happen as part of a single, deliberate development process. Over time, I expect different AI systems to be used for different and increasingly broad tasks, including and especially tasks that help complement human activities on scientific and technological advancement. There could be many different types of AI systems, each with its own revenue model and feedback loop, and their collective abilities could grow to the point where at some point, some set of them is able to do everything (with respect to scientific and technological advancement) that formerly required a human.

This is why I think it's basically justified to care about economy-growing automation of innovation as "the right working definition" from the x-risk reduction perspective for a funder like Open Phil in particular, which isn't what an AI researcher like Francois Chollet cares about. Which is fine, different folks care about different things. But calling the first definition "wrong" feels like the sort of mistake you make when you haven't at least good-faith effort tried to do what Scott suggested here with the first definition: 

... if you're looking into something controversial, you might have to just read the biased sources on both sides, then try to reconcile them.

Success often feels like realizing that a topic you thought would have one clear answer actually has a million different answers depending on how you ask the question. You start with something like "did the economy do better or worse this year?", you find that it's actually a thousand different questions like "did unemployment get better or worse this year?" vs. "did the stock market get better or worse this year?" and end up with things even more complicated like "did employment as measured in percentage of job-seekers finding a job within six months get better" vs. "did employment as measured in total percent of workforce working get better?". Then finally once you've disentangled all that and realized that the people saying "employment is getting better" or "employment is getting worse" are using statistics about subtly different things and talking past each other, you use all of the specific things you've discovered to reconstruct a picture of whether, in the ways important to you, the economy really is getting better or worse.

Note also that PASTA is a lot looser definitionally than the AGI defined in Metaculus' When will the first general AI system be devised, tested, and publicly announced? (2031 as of time of writing), which requires the sort of properties Chollet would probably approve (single unified software system, not a cobbled-together set of task-specialized subsystems), yet if the PASTA collective functionally completes the innovation -> resources -> PASTA -> innovation -> ... economic growth loop, that would already be x-risk relevant. The argument would then need to be "something like the Chollet's / Metaculus' definition is necessary to complete the growth loop", which would be a testable hypothesis.

AMF does. Quoting Rob Mathers' (AMF CEO) recent post, emphasis mine:

Many recognise the impact of AMF’s work, yet we still have significant immediate funding gaps that are over US$300m. ...

There is already a significant shortfall in funding for malaria control activities, including for net distribution programmes so miraculous things will have to happen in the coming year if we are to get anywhere close, globally and across all funding partners, to where we need to be to be able to drive malaria impact numbers down. Counterfactually of course, if the funding that is being brought to bear was not there, the number of people affected by malaria would be horrifically higher. Currently there are ~620,000 deaths a year from malaria and 250 million people fall sick. 

The Global Fund is the world’s largest funder of malaria control activities and has a funding replenishment round every three years, with funding provided by global governments, that determines the funds it has available across three disease areas: HIV/Aids, malaria and TB. The target for the 2024 to 2026 period was raising US$18 billion, largely to stand still. The funding achieved was US$15.7 billion. The shortfall will have major ramifications and we are already seeing the impact in planning in the Democratic Republic of Congo, one of the two countries in the world worst affected by malaria, for the 2024 to 2026 programme. Currently only 65% of the nets desperately needed will be able to be funded. We have never had this low a percentage of funding at this stage, with limited additional funding forecast.

The latest actual publicly-available RFMF figure I can find for AMF, and the other top GW charities, is here from Q3 2020, which is probably what you're referring to in the OP by "It's hard to find up-to-date data"; back then it was just $37.8M, nearly an order of mag lower, although I'm not sure whether Rob's and GiveWell's RFMF figures are like for like.

The justifications for these grants tend to use some simple expected value calculation of a singular rosy hypothetical casual chain. The problem is it's possible to construct a hypothetical value chain to justify any sort of grant. So you have to do more than just make a rosy casual chain and multiply numbers through.

Worth noting that even GiveWell doesn't rely on a single EV calculation either (however complex). Quoting Holden's 10 year old writeup Sequence thinking vs. cluster thinking:

Our approach to making such comparisons strikes some as highly counterintuitive, and noticeably different from that of other “prioritization” projects such as Copenhagen Consensus. Rather than focusing on a single metric that all “good accomplished” can be converted into (an approach that has obvious advantages when one’s goal is to maximize), we tend to rate options based on a variety of criteria using something somewhat closer to (while distinct from) a “1=poor, 5=excellent” scale, and prioritize options that score well on multiple criteria.

We often take approaches that effectively limit the weight carried by any one criterion, even though, in theory, strong enough performance on an important enough dimension ought to be able to offset any amount of weakness on other dimensions. 

... I think the cost-effectiveness analysis we’ve done of top charities has probably added more value in terms of “causing us to reflect on our views, clarify our views and debate our views, thereby highlighting new key questions” than in terms of “marking some top charities as more cost-effective than others.”

Load more