I'm Aaron, I've done Uni group organizing at the Claremont Colleges for a bit. Current cause prioritization is AI Alignment.
I am not aware of modeling here, but I have thought about this a bit. Besides what you mention, some other ways I think this story may not pan out (very speculative):
My current thinking is that there's a >20% chance that EA-oriented funders should be saving significant money to spend on compute for autonomous researchers, and it is an important thing for them to gain clarity on. I want to point out that there is probably a partial-automation phase (like point 3 above) before a full-automation phase. The partial-automation phase has less opportunity to usefully spend money on compute (plausibly still in the tens of millions of dollars), but our actions are more likely to matter. After that comes the full-automation phase where money can be scalably spent to e.g., differentially speed up alignment vs. AI capabilities research by hundreds of millions of dollars, but there's a decent chance our actions don't matter then.
As you mention, perhaps our actions don't matter then because humans don't control the future. I would emphasize that if we have fully autonomous, no humans in the loop, research happening without already having good alignment of those systems, it's highly likely that we get disempowered. That is, it might not make sense to aim to do alignment research at that point because either the crucial alignment work was already done, or we lose. Conditional on having aligned systems at this point, having saved money to spend on altruistically motivated cognitive work probably isn't very important because economic growth gets going really fast and there's plenty of money to be spent on non-alignment altruistic causes. On the other hand, something something at that point it's the last train on it's way to the dragon and it sure would be sad to not have money saved to buy those bed-nets.
A few weeks ago I did a quick calculation for the amount of digital suffering I expect in the short term, which probably gets at your question about these sizes, for the short term. tldr of my thinking on the topic:
You can read the slightly more thorough, but still extremely rough and likely wrong BOTEC here
Thanks for your response. I'll just respond to a couple things.
Re Constitutional AI: I agree normatively that it seems bad to hand over judging AI debates to AIs[1]. I also think this will happen. To quote from the original AI Safety via Debate paper,
Human time is expensive: We may lack enough human time to judge every debate, which we can address by training ML models to predict human reward as in Christiano et al. [2017]. Most debates can be judged by the reward predictor rather than by the humans themselves. Critically, the reward predictors do not need to be as smart as the agents by our assumption that judging debates is easier than debating, so they can be trained with less data. We can measure how closely a reward predictor matches a human by showing the same debate to both.
Re
We'd also really contest the 'perform very similarly to human raters' is enough---it'd be surprising if we already have a free lunch, no information lost, way to simulate humans well enough to make better AI.
I also find this surprising, or at least I did the first 3 times I came across medium-quality evidence pointing this direction. I don't find it as surprising any more because I've updated my understanding of the world to "welp, I guess 2023 AIs actually are that good on some tasks." Rather than making arguments to try and convince you, I'll just link some of the evidence that I have found compelling, maybe you will too, maybe not: Model Written Evals, MACHIAVELLI benchmark, Alpaca (maybe the most significant for my thinking), this database, Constitutional AI.
I'm far from certain that this trend, of LLMs being useful for making better LLMs and for replacing human feedback, continues rather than hitting a wall in the next 2 years, but it does seem more likely than not to me, based on my read of the evidence. Some important decisions in my life rely on how soon this AI stuff is happening (for instance if we have 20+ years I should probably aim to do policy work), so I'm pretty interested in having correct views. Currently, LLMs improving the next generation of AIs via more and better training data is one of the key factors in how I'm thinking about this. If you don't find these particular evidences compelling and are able to explain why, that would be useful to me!
I'm actually unsure here. I expect there are some times where it's fine to have no humans in the loop and other times where it's critical. It generally gives me the ick to take humans out of the loop, but I expect there are some times where I would think it's correct.
The article doesn't seem to have a comment section so I'm putting some thoughts here.
While my comment has been negative and focused on criticism, I am quite glad this article was written. Feel free to check out a piece I wrote, laying out some of my thinking around powerful AI coming soon, which is mostly orthogonal to this article. This comment was written sloppily, partially as my off-the-cuff notes while reading, sorry for any mistakes and impolite tone.
I'm not Buck, but I can venture some thoughts as somebody who thinks it's reasonably likely we don't have much time.
Given that "I'm skeptical that humans will go extinct in the near future" and that you prioritize preventing suffering over creating happiness, it seems reasonable for you to condition your plan on humanity surviving the creation of AGI. You might then back-chain from possible futures you want to steer toward or away from. For instance, if AGI enables space colonization, it sure would be terrible if we just had planets covered in factory farms. What is the path by which we would get there, and how can you change it so that we have e.g., cultured meat production planets instead. I think this is probably pretty hard to do; the term "singularity" has been used partially to describe that we cannot predict what would happen after it. That said, the stakes are pretty astronomical such that I think it would be pretty reasonable for >20% of animal advocacy effort to be specifically aimed at preventing AGI-enabled futures with mass animal suffering. This is almost the opposite of "we have ~7 years to deliver (that is, realise) as much good as we can for animals." Instead it might be better to have an attitude like "what happens after 7 years is going to be a huge deal in some direction, let's shape it to prevent animal suffering."
I don't know what kind of actions would be recommended by this thinking. To venture a guess: trying to accelerate meat alternatives, doing lots of polling around public opinions on moral questions around eating meat (with the goal of hopefully finding that humans think factory farming is wrong so a friendly AI system might adopt such a goal as well; human behavior in this regard seems like a particularly bad basis on which to train AIs). Pretty uncertain about these two idea and I wouldn't be surprised if they're actually quite bad.
I agree that persuasion frames are often a bad way to think about community building.
I also agree that community members should feel valuable, much in the way that I want everybody in the world to feel valued/loved.
I probably disagree about the implications, as they are affected by some other factors. One intuition that helps me is to think about the donors who donate toward community building efforts. I expect that these donors are mostly people who care about preventing kids from dying of malaria, and many donors also donate lots of money towards charities that can save a kid’s like for $5000. They are, I assume, donating toward community building efforts because they think these efforts are on average a better deal, costing less than $5000 for a live saved in expectation.
For mental health reasons, I don’t think people should generally hold themselves to this bar and be like “is my expected impact higher than where money spent on me would go otherwise?” But I think when you’re using other peoples altruistic money to community build, you should definitely be making trade offs, crunching numbers, and otherwise be aiming to maximize the impact from those dollars.
Furthermore, I would be extremely worried if I learned that community builders aren’t attempting to quantify their impact or think about these things carefully (noting that I have found it very difficult to quantify impact here). Community building is often indistinguishable (at least from the outside) from “spending money on ourselves” and I think it’s reasonable to have a super high bar for doing this in the name of altruism.
Noting again that I think it’s hard to balance mental health with the whacky terrible state of the world where a few thousand dollars can save a life. Making a distinction between personal dollars and altruistic dollars can perhaps help folks preserve their mental health while thinking rigorously about how to help others the most. Interesting related ideas:
https://www.lesswrong.com/posts/3p3CYauiX8oLjmwRF/purchase-fuzzies-and-utilons-separately https://forum.effectivealtruism.org/posts/zu28unKfTHoxRWpGn/you-have-more-than-one-goal-and-that-s-fine
Sorry about the name mistake. Thanks for the reply. I'm somewhat pessimistic about us two making progress on our disagreements here because it seems to me like we're very confused about basic concepts related to what we're talking about. But I will think about this and maybe give a more thorough answer later.
Edit: corrected name, some typos and word clarity fixed
Overall I found this post hard to read and I spent far too long trying to understand it. I suspect the author is about as confused about key concepts as I am. David, thanks for writing this, I am glad to see writing on this topic and I think some of your points are gesturing in a useful and important direction. Below are some tentative thoughts about the arguments. For each core argument I first try to summarize your claim and then respond, hopefully this makes it clearer where we actually disagree vs. where I am misunderstanding.
High level: The author makes a claim that the risk of deception arising is <1%, but they don’t provide numbers elsewhere. They argue that 3 conditions must all be satisfied for deception but neither of them are likely. The “how likely” affects that 1% number. My evaluation of the arguments (below) is that for each of these conjunctive conditions my rough probabilities (where higher means deception more likely) are: (totally unsure can’t reason about it) * (unsure but maybe low) * (high), yielding an unclear but probably >1% probability.
FWIW I often vote on posts at the top without scrolling because I listened to the post via the Nonlinear podcast library or read it on a platform that wasn't logged in. Not all that important of a consideration, but worth being aware of.
How is the super-alignment team going to interface with the rest of the AI alignment community, and specifically what kind of work from others would be helpful to them (e.g., evaluations they would want to exist in 2 years, specific problems in interpretability that seem important to solve early, curricula for AIs to learn about the alignment problem while avoiding content we may not want them reading)?
To provide more context on my thinking that leads to this question: I'm pretty worried that OpenAI is making themselves a single point of failure in existential security . Their plan seems to be a less-disingenuous version of "we are going to build superintelligence in the next 10 years, and we're optimistic that our alignment team will solve catastrophic safety problems, but if they can't then humanity is screwed anyway, because as mentioned, we're going to build the god machine. We might try to pause if we can't solve alignment, but we don't expect that to help much." Insofar as a unilateralist is taking existentially risky actions like this and they can't be stopped, other folks might want to support their work to increase the chance of the super-alignment team's success. Insofar as I want to support their work, I currently don't know what they need.
Another framing behind this question is just "many people in the AI alignment community are also interested in solving this problem, how can they indirectly collaborate with you (some people will want to directly collaborate, but this has corporate-closed-ness limitation).