I am the Principal Research Director at Rethink Priorities. I lead our Surveys and Data Analysis department and our Worldview Investigation Team.
The Worldview Investigation Team previously completed the Moral Weight Project and CURVE Sequence / Cross-Cause Model. We're currently working on tools to help EAs decide how they should allocate resources within portfolios of different causes, and to how to use a moral parliament approach to allocate resources given metanormative uncertainty.
The Surveys and Data Analysis Team primarily works on private commissions for core EA movement and longtermist orgs, where we provide:
Formerly, I also managed our Wild Animal Welfare department and I've previously worked for Charity Science, and been a trustee at Charity Entrepreneurship and EA London.
My academic interests are in moral psychology and methodology at the intersection of psychology and philosophy.
Survey methodology and data analysis.
If true, it could mean that any theory framed in opposition, such as a critique of Shorttermism or Longtermism, might be more appealing than the time-focused theory itself. Critizising short-term thinking is an applause light in many circles.
I agree this could well be true at the level of arguments i.e. I think there are probably longtermist (anti-shorttermist), framings which would be successful. But I suspect it would be harder to make this work at the level of framing/branding a whole movement, i.e. I think promoting the 'anti-shorttermist' movement would be hard to do successfully.
It takes a significant amount of time to mark a test task. But this can be fixed by just adjusting the height of the screening bar, as opposed to using credentialist and biased methods (like looking at someone's LinkedIn profile or CV).
Whether or not to use "credentialist and biased methods (like looking at someone's LinkedIn profile or CV)" seems orthogonal to the discussion at hand?
The key issue seems to be that if you raise the screening bar, then you would be admitting fewer applicants to the task (the opposite of the original intention).
This is an empirical question, and I suspect is not true. For example, it took me 10 minutes to mark each candidates 1 hour test task. So my salary would need to be 6* higher (per unit time) than the test task payment for this to be true.
This will definitely vary by org and by task. But many EA orgs report valuing their staff's time extremely highly. And my impression is that both grading longer tasks and then processing the additional applicants (many orgs will also feel compelled to offer at least some feedback if a candidate has completed a multi-hour task) will often take much longer than 10 minutes total.
Orgs can continue to pay top candidates to complete the test task, if they believe it measurably decreases the attrition rate, but give all candidates that pass an anonymised screening bar the chance to complete a test task.
My guess is that, for many orgs, the time cost of assessing the test task is larger than the financial cost of paying candidates to complete the test task, and that significant reasons for wanting to compensate applicants are (i) a sense of justice, (ii) wanting to avoid the appearance of unreasonably demanding lots of unpaid labour from applicants, not just wanting to encourage applicants to complete the tasks[1].
So I agree that there are good reasons for wanting more people to be able to complete test tasks. But I think that doing so would potentially significantly increase costs to orgs, and that not compensating applicants would reduce costs to orgs by less than one might imagine.
I also think the justice-implications of compensating applicants are unclear (offering pay for longer tasks may make them more accessible to poorer applicants)
I think that many applicants are highly motivated to complete tasks, in order to have a chance of getting the job.
I guess it depends on the specifics of the situation, but, to me, the case described, of a board member making one or two incorrect claims (in a comment that presumably also had a bunch of accurate and helpful content) that they needed to walk back sounds… not that bad? Like, it seems only marginally worse than their comment being fully accurate the first time round...
I agree that it depends on the situation, but I think this would often be quite a lot worse in real, non-ideal situations. In ideal communicative situations, mistaken information can simply be corrected at minimal cost. But in non-ideal situations, I think one will often see things like:
Fwiw, I think different views about this ideal/non-ideal distinction underlie a lot of disagreements about communicative norms in EA.
Thanks Ben!
I don't think there's a single way to interpret the magnitude of the differences or the absolute scores (e.g a single effect size), so it's best to examine this in a number of different ways.
One way to interpret the difference between the ratings is to look at the probability of superiority scores. For example, for Study 3 we showed that ~78% of people would be expected to rate longtermism AI safety (6.00) higher than longtermism (4.75). In contrast, for AI safety vs effective giving (5.65), it's 61%, and for GCRR (5.95) it's only about 51%.
You can also examine the (raw and weighted) distributions of the responses. This allows one to assess directly how many people "Like a great deal", "Dislike a great deal" and so on.
You can also look at different measures, which have a more concrete interpretation than liking. We did this with one (interest in hearing more information about a topic). But in future studies we'll include additional concrete measures, so we know e.g. how many people say they would get involved with x movement.
I agree that comparing these responses to other similar things outside of EA (like "positive action" but on the negative side) would be another useful way to compare the meaning of these responses.
One other thing to add is that the design of these studies isn't optimised for assessing the effect of different names in absolute terms, because we every subject evaluated every item ("within-subjects"). This allows greater statistical power more cheaply, but the evaluations are also more likely to be implicitly comparative. To get an estimate of something like the difference in number of people who would be interested in x rather than y (assuming they would only encounter one or the other in the wild at a single time), we'd want to use a between-subjects design where people only evaluate one and indicate their interest in it.
Thanks Ben!
For descriptions or descriptions with terms the questions were "How much do you like or dislike each of the following?" (Study 2) and "Please indicate the extent to which you like or dislike the following" (Study 3), both asked on a 7 point scale from "Dislike a great deal" - "Like a great deal".
For terms only, we included some more instructions to ensure that people evaluated the terms specifically, rather than trying to evaluate the things referred to by the terms: "Based only on the terms provided below, how much do you like or dislike each of the following? (Please note that you do not have to have heard of the terms to answer. Please just answer based on the names below without looking up additional information)." This was also asked on a 7 point scale from "Dislike a great deal" - "Like a great deal".
For cause areas (Study 1), the question: "Based on the description of each cause area below, please indicate to what extent you like or dislike a movement that promotes giving resources to this area." And, again, we used the scale 7 point scale from "Dislike a great deal" - "Like a great deal".
Thanks Deborah!
I'd like to see more research, however, as to why longtermism performed poorly in comparison with global catastrophic risks, because many of the latter play out on a long-term timescale.
I think there are likely multiple different factors here:
Specific cause areas like AI safety and pandemic preparedness were generally better liked than broader concepts like EA or longtermism.
The summary was generally good, but I wouldn't say the above exactly. In the one study where we tested specific causes against broader concepts, AI Safety and Pandemic preparedness were roughly neck and neck with the general broader concept Global catastrophic risk reduction. Those three were more popular than Climate change (specific), Effective Altruism and Effective Giving (broader), which were neck and neck with each other. And all were more effective than Longtermism. So there wasn't a clear difference between specific cause area vs broader concept distinction.
Thanks! This was supported via Manifund and then topped up by Open Philanthropy after they saw it on there (so thanks to our donors on there and to Open Phil!).
I checked and people who currently work in an EA org are only slightly older on average (median 29 vs median 28).