SA

Scott Alexander

2479 karmaJoined

Comments
33

Okay, so GWWC, LW, and GiveWell, what are we going to do to reverse the trend?

Seriously, should we be thinking of this as "these sites are actually getting less effective at recruiting EAs" or as "there are so many more recruitment pipelines now that it makes sense that each one would drop in relative importance" or as "any site will naturally do better in its early years as it picks the low-hanging fruit in converting its target population, then do worse later"?

The first point seems to be saying that we should factor in the chance that a program works into cost-effectiveness analysis. Isn't this already a part of all such analyses? If it isn't, I'm very surprised by that and think it would be a much more important topic for an essay than anything about PEPFAR in particular.

The second point, that people should consider whether a project is politically feasible, is well taken. It sounds like the lesson here is "if you find yourself in a situation where you have to recommend either project A or B, and both are good, but A is better than B, but if you do activism for A it still won't happen, but if you do activism for B your input will push it over the edge into happening, do activism for B." I agree with this as far as it goes, but there seem like some important caveats:

  • You shouldn't lie, for the normal reasons against lying. It sounds like some of these economists were just publishing reports or articles saying that antimalarials were better than PEPFAR, and I think if that was what they found, then publishing that in reports is correct regardless of the politics. Then when they put on their activist hats they can support whatever is most effective to support.
  • It's a really hard problem to know whether to pursue less ambitious vs. more ambitious goals. If you're a socialist, should you spend your time fighting for a $1 minimum wage increase, for Medicare For All, or for completely dismantling capitalism and replacing it with workers' communes? I don't think there's an obvious answer, even though the first is clearly more likely to succeed than the second two. In retrospect it's clear that PEPFAR worked politically, and if you say that more cost-effective options wouldn't have worked politically at that time then I believe you, but I don't want to conclude that therefore less ambitious political projects are always better than more ambitious ones, and I don't know how else to apply the lesson from this story more generally.

I find this interesting, but also somewhat hard to identify any meaningful patterns. For example, one could expect red points to be clustered at the top for Manifold, indicating that more forecasts equal better performance. But we don't see that here. The comparison may be somewhat limited anyway: In the eyes of the Metaculus community prediction, all forecasts are created equal. On Manifold, however, users can invest different amounts of money. A single user can therefore in principle have an outsized influence on the overall market price if they are willing to spend enough. I'd be interested to see more on how accuracy on Manifold changes with the number of traders and overall trading volume. Who knows, maybe Manifold would be ahead if they had a similar number of forecasters to Metaculus?

Does this mean that if you controlled for number of forecasters, you still think Metaculus would beat Manifold? If not, do you have any opinion on this question (sorry if I missed it).

Thank you. I misremebered the transcription question. I now agree with all of your resolutions, with the most remaining uncertainty on translation.

Thank you for doing this! I was working on a similar project and mostly came up with the same headline finding as you: the experts seemed well-calibrated. I did  decide a few of the milestones a little differently, and would like to hear why you chose the way you did so I can decide whether or not to change mine.

  • Zach Stein-Perlman from AI Impacts said that he thought "efficiently sort very large lists" and "write good Python code" were false, because the questions said it had to be done in a certain way by a certain type of neural net, and that wasn't how it was done.
  • I was planning to count "transcribe as well as humans" as false, based on https://interactiveaimag.org/columns/is-ai-at-human-parity-yet-a-case-study-on-speech-recognition/ . Maybe the top labs could achieve this with a year of work, but I think the question specifies they need to do as well as the best human transcriptionists, and right now they don't seem close.
  • I counted "translate as well as bilingual humans" as true based on a few quick tests of ChatGPT; I'm curious if you have some specific source for why it's false.
  • I don't think AI has won at Starcraft. The last word I've heard from this was https://www.extremetech.com/extreme/301325-deepminds-starcraft-ii-ai-can-now-defeat-99-8-percent-of-human-players , where AlphaStar could beat 99.8% of humans but not the absolute champions. I haven't seen any further progress on this since 2019. Again, it's possible that a year of concerted effort could change this, but that seems speculative. See also https://www.reddit.com/r/starcraft/comments/uakohx/why_cant_we_make_a_perfect_ai_for_starcraft/
  • I'm surprised you judged "high marks for a high school essay" as false; this seems like a central use case for ChatGPT and Bing/GPT4.
  • I was planning to judge "concisely explain game play" as true, based on https://www.forbes.com/sites/carlieporterfield/2022/11/22/metas-ai-gamer-beat-humans-in-diplomacy-using-strategy-and-negotiation/, which is testing basically this skill. Also, I was able to play a partial game of chess with ChatGPT where it explained all its moves - before it started hallucinating and making moves which were impossible. Still, it seemed to have the "explanation" skill down pat! I imagine if you asked it to explain why a chess engine made a given move, it would give a pretty plausible answer.

Beyond those quibbles - I was also looking at https://aiimpacts.org/2022-expert-survey-on-progress-in-ai/#Data (the dataset itself; the summary doesn't include the milestones). This new version seems like total garbage. The experts continue to predict several of the milestones are five years out, including milestones that were achieved by ChatGPT (ie a few months after the survey) and at least one milestone that had already clearly been achieved by the time the survey was released! Unless there's some reason to think the new crop of experts is worse than the old one, this makes me think they only did okay last time by luck/coincidence, and actually they have no idea what they're doing.

(I don't think it works to say that the period 2017-2022.5 was predictable, but the period 2022.5-2023 wasn't, because part of what the 2017 experts were right about was ChatGPT, which came out in late 2022.)

Thanks for asking. One reason we decided to start with forecasting was because we think it has comparatively low risks compared to other fields like AI or biotech. 

If this goes well and we move on to a more generic round, we'll include our thoughts on this, which will probably include a commitment not to oracular-fund projects that seem like they were risky when proposed, and maybe to ban some extremely risky projects from the market entirely. I realize we didn't explicitly say that here, which is because this is a simplified test round and we think the forecasting focus makes risks pretty unlikely.

In the unlikely event that someone proposes a forecasting project < $20,000 which we think carries significant risk, we're prepared to take those steps this time too.

In 2018, I collected data about several types of sexual harassment on the SSC survey, which I will report here to help inform the discussion. I'm going to simplify by assuming that only cis women are victims and only cis men are perpetrators, even though that's bad and wrong.

Women who identified as EA were less likely report lifetime sexual harassed at work than other women, 18% vs. 20%. They were also less likely to report being sexually harassed outside of work, 57% vs. 61%. 

Men who identified as EA were less likely to admit to sexually harassing people at work (2.1% vs. 2.9%) or outside of work (16.2% vs. 16.5%)

The sample was 270 non-EA women, 99 EA women, 4940 non-EA men, and 683 EA men. None of these results were statistically significant, although all of them trended in the direction of EAs experiencing less sexual harassment. 

This doesn't prove that EA environments have less harassment than the average environment, since it could be that EAs are biased to have less sexual harassment for other reasons, and whatever additional harassment they get in EA isn't enough to make up for it; the vast majority of EAs have the vast majority of interactions in non-EA environments. I tried to sort of get around this by limiting my analysis to people living in California, on the grounds that they were more likely to be plugged into EA communities and jobs.  Conditional on being a woman in California, being EA did make someone more likely to experience sexual harassment, consistently, as measured in many different ways. But Californian EAs were also younger, much more bisexual, and much more polyamorous than Californian non-EAs; adjusting for sexuality and polyamory didn't remove the gap, but age was harder to adjust for and I didn't try.  EAs who said they were working at charitable jobs that they explicitly calculated were effective had lower harassment rates than the average person, but those working at charitable jobs that they didn't expliclitly calculate were higher. All of these subgroup analyses were very small sample size.

Overall I am not sure that anything can be concluded from these results either way.

I would urge everyone thinking about this question to read my original discussion of the sexual harassment survey results. It mostly focuses on professions but I think the overall conclusion is extremely relevant here too. You can also find the link to the data there in case you want to double-check my results.

Minor object-level objection: you say we should predict that crypto exchanges like FTX to fail, but I tried to calculate the risk of this in the second part of my post, and the average FTX-sized exchange fails only very rarely. 

I don't think this is our main point of disagreement though. My main point of disagreement is about how actionable this is and what real effects it can have.

I think that the main way EA is "affiliated with" crypto  is that it has accepted successful crypto investors' money. Of people who have donated the most to EA, I think about 5-7 of the top ten names made their money in something crypto-related (even counting all the FTX people as one donor). Some of those people (example: Vitalik Buterin) are well-liked, honest, and haven't done anything to embarrass us. I think it would be practically bad to stop accepting their people, and morally bad (as a betrayal) to denounce them and writing them out of the movement based on guilt by association. (CoI note: I have benefited from non-FTX crypto money)

I see you're not recommending that EA stop taking crypto money. But then I'm not sure what you do want, other than what's already happening:

  • You recommend EA not invest in crypto, but I don't think the movement is really doing this, and if they are I would expect that to be for normal economic reasons like diversification (and in general I expect Open Phil's investment managers to know more than us) .
  • You recommend that organizations not put crypto people on their board, but I don't know of this happening except when those people have already been EAs before getting into crypto (I think SBF was on CEA's board before he got into crypto, although I could be wrong). If it was happening in other cases, I would assume it was because of standard practices around very big donors getting on the board, and not because EAs love crypto so much that they invite random crypto leaders to join company boards. If you know of examples to the contrary I would be interested in hearing them.
  • You recommend not boasting about ties to crypto insiders. I haven't seen this happen except with SBF, where I think the boasting was along the lines of "look how well this person earning to give paid off". I agree that people should do less of that in the future.

Although the point of "don't invite random crypto scammers to serve on your board and become the public face of EA for no reason" is obviously correct, I don't know of anyone actually doing this, and so I worry that the real effect of posts like this will be to slowly make crypto so toxic in this community that EA leaders feel pressured to refuse crypto donations for PR reasons, and then we lose > half of our potential money. I'm especially worried about some kind of purity spiral, where after crypto is toxified, the next level is people arguing that Facebook has also been a pretty evil company at various points and so maybe we shouldn't accept Dustin's money either. I don't see a good Schelling fence here and would prefer not to start down that slope. I think we should avoid associating  with (including taking money from) anyone who seems likely to be an outright fraud or breaking the law, and maybe some extremely harmful industries like tobacco, but not try to more generally be the arbiters of which industries are vs. aren't socially productive.

Thanks for your thoughtful response.

I'm trying to figure out how much of a response to give, and how to balance saying what I believe vs. avoiding any chance to make people feel unwelcome, or inflicting an unpleasant politicized debate on people who don't want to read it. This comment is a bad compromise between all these things and I apologize for it, but:

I think the Kathy situation is typical of how effective altruists respond to these issues and what their failure modes are. I think "everyone knows" (in Zvi's sense of the term, where it's such strong conventional wisdom that nobody ever checks if it's true ) that the typical response to rape accusations is to challenge and victim-blame survivors. And that although this may be true in some times and places, the typical response in this community is the one which, in fact, actually happened - immediate belief by anyone who didn't know the situation, and a culture of fear preventing those who did know the situation from speaking out. I think it's useful to acknowledge and push back against that culture of fear.

(this is also why I stressed the existence of the amazing Community Safety team - I think "everyone knows" that EA doesn't do anything to hold men accountable for harm, whereas in fact it tries incredibly hard to do this and I'm super impressed by everyone involved)

I acknowledge that makes it sound like we have opposing cultural goals - you want to increase the degree to which people feel comfortable expressing out that EA's culture might be harmful to women, I want to increase the degree to which people feel comfortable pushing back against claims to that effect which aren't true. I think there is some subtle complicated sense in which we might not actually have opposing cultural goals, but I agree to a first-order approximation they sure do seem different. And I realize this is an annoyingly stereotypical situation  - I, as a cis man, coming into a thread like this and saying I'm worried about a false accusations and chilling effects. My only two defenses are, first, that I only got this way because of specific real and harmful false accusations, that I tried to do an extreme amount of homework on them before calling false, and that I only ever bring up in the context of defending my decision there.  And second, that I hope I'm possible to work with and feel safe around, despite my cultural goals, because I want to have a firm deontological commitment to promoting true things and opposing false things, in a way that doesn't refer to my broader cultural goals at any point. 

Load more