Epistemic status: some thoughts I wanted to get out quickly
A lot of fantastic work has been done by people in the AI existential risk research community and related communities over the last several months in raising awareness about risks from advanced AI. However, I have some cause for unease that I’d like to share.
These efforts may have been too successful too soon.
Or, more specifically, this level of outreach success this far ahead of the development of AI capable of posing existential risk may have fallout. We should consider steps to mitigate this.
(1) Timelines
I know that there are well-informed people in the AI and existential risk communities who believe AI capable of posing existential risk may be developed within 10 years. I certainly can’t rule this out, and even a small chance of this is worth working to prevent or mitigate to the extent possible, given the possible consequences. My own timelines are longer, although my intuitions don’t have a rigorous model underpinning them (my intuitions line up similarly to the 15-40 year timelines mentioned in this recent blog post by Matthew Barnett from Epoch).
Right now the nature of media communications means that the message is coming across with a lot of urgency. From speaking to lay colleagues, impressions often seem to be of short timelines (and some folks e.g. Geoff Hinton have explicitly said 5-20 years, sometimes with uncertainty caveats and sometimes without).
It may be that those with short (<10 years) timelines are right. And even if they’re not, and we’ve got decades before this technology poses an existential threat, many of the attendant challenges – alignment, governance, distribution of benefits – will need that additional time to be addressed. And I think it’s entirely plausible that the current level of buy-in will be needed in order to initiate the steps needed to avoid the worst outcomes, e.g. recruiting expertise and resources to alignment, development and commitment to robust regulation, even coming to agreements not to pursue certain technological developments beyond a certain point.
However, if short timelines do not transpire, I believe there’s a need to consider a scenario I think is reasonably likely.
(2) Crying wolf
I propose that it is most likely we are in a world where timelines are >10 years, perhaps >20 or 30 years. Right now this issue has a lot of the most prominent AI scientists and CEOs signed up, and political leaders worldwide committing to examining the issue seriously (examples from last week). What happens then in the >10 year-timeline world?
The extinction-level outcomes that the public is hearing, and that these experts are raising and policymakers making costly reputational investments in, don’t transpire. What does happen is all the benefits of near-term AI that have been talked about, plus all the near-term harms that are being predominantly raised by the AI ethics/FAccT communities. Perhaps these harms include somewhat more extreme versions than what is currently talked about, but nowhere near catastrophic. Suddenly the year is 2028, and that whole 2023 furore is starting to look a bit silly. Remember when everyone agreed AI was going to make us all extinct? Yeah, like Limits to Growth all over again. Except that we’re not safe. In reality, in this scenario, we’re just entering the period in which risk is most acute, and in which gaining or maintaining the support of leaders across society for coordinated action is most important. And it’s possibly even harder to convince them, because people remember how silly lots of people looked the last time. [1] [2]
(3) How to navigate this scenario (in advance).
Suggestions:
- Have our messaging make clear that we don’t know when extinction-potential AI will be developed, and it’s quite likely that it will be over a decade, perhaps much longer. But it needs to be discussed now, because
- we can’t rule out that it will be developed sooner;
- there are choices to be made now that will have longer-term consequences;
- the challenges need a lot more dedicated time and effort than they’ve been getting.
Uncertainty is difficult to communicate in media, but it’s important to try.
- Don’t be triumphal over winning the public debate now; it may well be ‘lost’ again in 5 years
- Don’t unnecessarily antagonise the AI ethics/FaCCT folk [3] because they’re quite likely to look like the ones who were right in 5 years (and because it’s just unhelpful).
- Build bridges where possible with the AI ethics/FaCCT folk on a range of issues and interventions that seem set to overlap in that time; work together where possible. Lots of people from those communities are making proposals that are relevant and overlapping with challenges associated with the path to transformative AI. This includes external evaluation; licensing and liability; oversight of powerful tech companies developing frontier AI; international bodies for governing powerful AI, and much more. E.g. see this and this, as well as CAIS's recent blog post.
- Don’t get fooled into thinking everyone now agrees. A lot more senior names are now signing onto statements and speaking up, and this is making it easier for previously silent-but-concerned researchers to speak up. However I think a majority of AI researchers probably still don’t agree this is a serious, imminent concern (Yann LeCun’s silent majority is probably still real), and this disconnect in perceptions may result in significant pushback to come.
- Think carefully about the potential political fallout if and when this becomes an embarrassing thing for the politicians who have spoken up, and how to manage this.
To sum: I’m not saying it was wrong to push for this level of broad awareness and consensus-building; I think it may well turn out to be necessary this early in order to navigate the challenges on the path to transformative AI, even if we still have decades until that point (and we may not). But there’s the potential for a serious downside/backlash that this community, and everyone who shares our concern about existential risk from AI, should be thinking carefully about, in terms of positioning for effectiveness on slightly longer timelines.
Thank you to Shakeel Hashim, Shahar Avin, Haydn Belfield and Ben Garfinkel for feedback on a previous draft of this post.
- ^
Pushing against this, it seems likely that AI will have continued advancing as a technology, leading to ever-greater scientific and societal impacts. This may maintain or increase the salience of the idea that AI could pose extremely significant risks.
- ^
A ‘softer’ version of this scenario is that some policy happens now, but then quietly drops off / gets dismantled over time, as political attention shifts elsewhere
- ^
I don’t know how much this is happening in practice (there’s just so much online discourse right now it’s hard to track), but I have seen it remarked on several times e.g. here
I suspect that if transformative AI is 20 or even 30 years away, AI will still be doing really big, impressive things in 2033, and people at that time will get a sense that even more impressive things are soon to come. In that case, I don't think many people will think that AI safety advocates in 2023 were crying wolf, since one decade is not very long, and the importance of the technology will have only become more obvious in the meantime.
Yes, I think this is plausible-to-likely, and is a strong counter-argument to the concern I raise here.
Hmm, fwiw, I spontaneously think something like this is overwhelmingly likely.
Even in the (imo unlikely) case of AI research basically stagnating from now on, I expect AI applications to have effects that will significantly affect the broader public and not make them think anything close to "what a nothingburger" (e.g. like I've heard it happen for nanotechnology). E.g. I'm thinking of things like the broad availabiltiy of personal assistants & AI companions, automating of increasingly many tasks, impacts on education, on the productivity of software developers.
And in case we'll also see a stagnation of significant applications, I expect this would be caused by some external event (e.g. say a severe economic or financial crisis) that will also make people not think of the current moment as crying wolf.
I don't think that this is how policy discussions will actually work in those long-timeline worlds. Seeing impressive things means that people will benefit from AI systems, it will appear pretty harmless, and because the posited risks haven't actually caused large-scale harm, there will be less willingness to admit that the risks exist, and there will absolutely be claims that AI risk "doomers" cried wolf and slowed down something wonderful and "ultimately" harmless. (Until it turns out that the doomers were right, and are hailed as prophets, but too late to matter.)
On the other hand, the most likely alternative is that we see lots of near-term harms, and we get lots of opposition on the basis of job loss, misinformation, and similar - but I'm skeptical that this pushes in the direction of safer AI systems, and might instead simply lead to tons of research that increases capability and risk, but has limits on industry deployment.
I'd be very surprised if AI will predominantly be considered risk-free in long-timelines worlds. The more AI will be integrated into the world, the more it will interact with and cause harmful events/processes/behaviors/etc., like take the chatbot that apparently facilitated a suicide.
And I take Snoop Doggs reaction to recent AI progress as somewhat representative of a more general attitude that will get stronger even with relatively slow and mostly benign progress
I.e. it will continuously feel weird and novel and worth pondering where AI progress is going and where the risks are, and more serious people will join doing this which will again increase the credbility of those concerns.
"Considered risk free" is very different than what I discussed, which is that the broad public will see much more benefit, and have little direct experience of the types of harms that we're concerned about. Weird and novel won't change the public's minds about the technology, if they benefit, and the "more serious people" in the west who drive the narrative, namely, politicians, pundits, and celebrities, still have the collective attention span of a fish. And in the mean time, RLHF will keep LLMs from going rogue, they will be beneficial, and it will seem fine to everyone not thinking deeply about the risk.
FWIW I think that it's pretty likely that AGI etc. will happen within 10 years absent strong regulation, and moreover that if it doesn't, the 'crying wolf' effect will be relatively minor, enough that even if I had 20-year medians I wouldn't worry about it compared to the benefits.
I also guess cry wolf-effects won't be as large as one might think - e.g. I think people will look more at how strong AI systems appear at a given point than at whether people have previously warned about AI risk.
Yeah, I was going to post that tweet. I'd also like to mention my related thread that if you have a history of crying wolf, then when wolves do start to appear, you’ll likely be turned to as a wolf expert.
There's an additional problem that people who sound the alarms will likely be accused by some of "crying wolf" regardless of the outcome:
World A) Group X cries wolf. AI was not actually dangerous, nothing bad happens. Group X (rightly) gets accused of crying wolf and loses credibility, even if AI gets dangerous at some future point.
World B) Group X cries wolf. AI is actually dangerous, but because they cried wolf, we manage the risk and there is no catastrophe. Seeing the absence of a catastrophe, some people will accuse group X of crying wolf and they lose credibility.
I gave an argument for why I don't think the cry wolf-effects would be as large as one might think in World A. Afaict your comment doesn't engage with my argument.
I'm not sure what you're trying to say with your comment about World B. If we manage to permanently solve the risks relating to AI, then we've solved the problem. Whether some people will then be accused of having cried wolf seems far less important relative to that.
You're right - my comment is addressing an additional problem. (So I maybe should've made it a standalone comment)
As far as your second point is concerned - that's true, unless we will face risk (again, and possibly more) at a later point. I agree with you that "crying wolf-effects" matter less or not at all under conditions where a problem is solved once and for all (unless it affects the credibility of a community which simultaneously works on other problems which remain unsolved, as is probably true of the EA community).
Sean - reasonable points.
Re. the crying wolf problem, it would be great to see some more systematic historical analyses of social/moral movements that have warned consistently of bad possible outcomes over many years, and then either succeed or failed in being taken seriously over the long term by the public.
My casual impression is that there have been some activist movements that managed to sustain public concern about certain issues consistently for decades, despite worst-case scenarios not happening, e.g.
1) nuclear war: the anti-nuke activists who were warning about nuclear war ever since the 1960s continued to be taken seriously right up to the present, despite nuclear war not happening.
2) climate change: the eco-activists who warned about bad effects of global warming have been taken seriously by many people, including most mainstream media, ever since the 1990s, despite the worst predicted effects (e.g. mass starvation, huge sea level rises, hundreds of millions dead) not happening.
3) Antifa: the 'anti-fascist' movement on the political Left has been highly active since the 1990s, and has warned about a looming threat of a fascist/authoritarian takeover of Western democracies, which has never happened; yet their narrative continues to be taken seriously by many Left-leaning activists, journalists, and politicians.
4) AI itself: ever since the 1950s, science fiction authors, movie-makers, and futurists have warned about possible bad outcomes of AI, and those haven't really happened so far, yet most of the public does not discount those warnings, since they understand that the risk keeps ramping up as AI capabilities increase.
I'd welcome other examples or counter-examples.
Another example is Net Neutrality, where despite making very specific major predictions that were basically entirely falsified, I have never seen any serious negative consequences for those who argued that repealing net neutrality would be a disaster, nor anyone change their mind as a result about other related issues.
I don't think points about timelines reflect an accurate model of how AI regulations and guardrails are actually developed. What we need is for Congress to pass a law ordering some department within the executive branch to regulate AI, eg by developing permitting requirements or creating guidelines for legal AI research or whatever. Once this is done, the specifics of how AI is regulated are mostly up to that executive branch, which can and will change over time.
Because of this, it is never "too soon" to order the regulation of AI. We may not know exactly what regulations would be like, but this is very unlikely to be written into law anyway. What we want right now is to create mechanisms to develop and enforce safety standards. Similar arguments apply to internal safety standards at companies developing AI capabilities.
It seems really hard for us to know exactly when AGI (or ASI or whatever you want to call it) is actually imminent. Even if it was possible, however, I just don't think last-minute panicking about AGI would actually accomplish much. It's all but impossible to quickly create societal consensus that the world is about to end before any harm has actually occurred. I feel like there's an unrealistic image of "we will panic and then everyone will agree to immediately stop AI research" implicit in this post. The smart thing to do is to develop mechanisms early and then use these mechanisms when we get closer to crunch time.
i don't think we need to worry too much about 'crying wolf'. The effects of media coverage and persuasive messaging on (1) attitudes, and (2) perceived issue importance both substantially (though not necessarily entirely) wash out in a matter of weeks to months.
So I think we should be somewhat worried about wasted efforts -- not having a sufficiently concrete action plan to capitalise on the attention gained -- but not so much about lasting negative effects.
(More speculatively: I expect that there are useful professional field-building effects that will last longer than public opinion effects though, e.g. certain researchers deciding it now merits their attention, which make these efforts worthwhile any.)
Sharing a relevant blog post today by Harry Law on the limits to growth and predictions of doom, and lessons for AI governance, which cites this post.
Apologies that I still owe some replies to the discussion below, I've found it all really helpful (thank you!). I agree with those who say that it would be useful to have some deeper historical analysis of the impact of past 'doomer' predictions on credibility, which is clearly informative to the question of the weight we should assign to the 'cry wolf' concern.
https://www.harrylaw.co.uk/post/ai-governance-and-the-limits-to-growth
My question (similar to some of the comments belowmespecially @Geoffrey Miller) is whether backlash from crying wolf is a meaningul phenomenon in the public sphere. It would be great if someone could share an example where people cried wolf about something and the backlash.
Is there an example? The nuclear example below is a great one I think.
Opposition Politicians around the world seem to know this well, they predict economic or moral catastrophe under the current government, and people almost never call them out on it when they are wrong and stability continues.
With even faster moving news cycles and shorter attention spans, I think if anything the crying worlf phenomenon will become less and less likely. Anything we can do to get AI in front of people's faces and stress the importance remains net good IMO.
Super great post. I've been thinking about posting a nuance in (what I think about) the Eliezer class of threat models but haven't gotten around to it. (Warning: negative valence, as I will recall the moment I first underwent visceral sadness at the alignment problem).
Rob Bensinger tweeted something like "if we stick the landing on this, I'm going to lose an unrecoverable amount of bayes points", and for two years already I've had a massively different way of thinking about deployment of advanced systems because I find something like a "law of mad science" very plausible.
The high level takeaway is that (in this class of threat models) we can "survive takeoff" (not that I don't hate that framing) and accumulate lots of evidence that the doomcoin landed on heads (really feeling like we're in the early stages of a glorious transhuman future or a more modest FALGSC), for hundreds of years. And then someone pushes a typo in a yaml file to the server, and we die.
There seems to be very little framing of "mostly Eliezer-like 'flipping the doomcoin' scenario, where forecasters thus far have only concerned themselves with the date of the first flip, but from then on the doomcoin is flipped on new years eve at midnight every year until it comes up tails and we die". In other words, if we are obligated to hustle the weight of the doomcoin now before the first flip, then we are at least as obligated to apply at least constant vigilance, forevermore, and there's a stronger case to be made for demanding strictly increasing vigilance (pulling the weight of the doomcoin further and further every year). (this realization was my visceral sadness moment, in 2021 on discord, whereas before I was thinking about threat models as like a fun and challenging video game RNG or whatever).
I think the oxford folks have some literature on "existential security", which I just don't buy or expect at all. It seems deeply unlikely to me that there will be tricks we can pull after the first time the doomcoin lands on heads to keep it from flipping again. I think the "pivotal act" literature from MIRI tries to discuss this, by thinking about ways we can get some freebie years thrown in there (new years eve parties with no doomcoin flip), which is better than nothing. But this constant/increasing vigilance factor or the repeated flips of doomcoin seems like a niche informal inside view among people who've been hanging out longer than a couple years.
Picking on Eliezer as a public intellectual for a second, insofar as my model of him is accurate (that his confidence that we die is more of an "eventually" thing and he has very little relation to Conjecture, who in many worlds will just take a hit to their brier score in 2028, which Eliezer will be shielded from because he doesn't commit to dates), I would have liked to see him retweet the Bensinger comment and warn us about all the ways in which we could observe wildly transformative AI not kill everyone, declare victory, then a few hundred years later push a bad yaml file to the server and die.
(All of this modulo my feelings that "doomcoin" is an annoying and thought-destroying way of characterizing the distribution over how you expect things to go well and poorly, probably at the same time, but that's it's own jar of paperclips)
I think that's strongly contra Eliezer's model, which is shaped something like "succeeding at solving the alignment problem eliminates most sources of existential risk, because aligned AGI will in fact be competent to solve for them in a robust way". This does obviously imply something about the ability of random humans to
spin up unmonitored nanofactoriespush a bad yaml file. Maybe there'll be some much more clever solution(s) for various possible problems? /shrugYeah, I think "ASI implies an extreme case of lock-in" is a major tendency in the literature (especially sequences-era), but 1. people disagree about whether "alignment" refers to something that outsmarts even this implication or not, then they disagree about relative tractability and plausibility of the different alignment visions, and 2. this is very much a separate set of steps that provide room for disagreement among people who broadly accept Eliezer-like threatmodels (doomcoin stuff).
I don't want to zero in on actually-existing Eliezer (at whichever time step), I'm more interested in like a threatmodel class or cluster around lack of fire alarms, capabilities we can't distinguish from magic, things of this nature.
Appreciate the post! A couple of thoughts:
Thanks! Re:
1. I think this is plausible (though I'm unclear on whether you mean 'we as AI risk research community' or 'we as humanity' here)
2. This bias definitely exists, but AI in the last year has cut through to broader society in a huge way (I keep overhearing conversations on chatgpt and other things in cafes, on trains, etc, admittedly in the cambridge/london area; suddenly random family members have takes etc. It's showing up in my wife's social media, and being written about by the political journalists she follows, where it never had before, etc). Ditto (although to a smaller extent) AI xrisk. EA/FTX didn't cut through to anything at all like the same extent.
This really does hinge on timelines. Post-GPT-4, I (and many others) have had a realisation that it seems likely that we basically have the AGI paradigm already (foundation models + planners/plugins), and are therefore in an emergency situation.
I'm willing to burn social as well as financial capital on saying this loudly. If in 5 years time we are still here for reasons other than a global AGI moratorium happening, then I'll be happy to take the hit, as at least we won't all be dead.
Is the parent comment really so bad that it warrants being downvoted to the point of being hidden?
I think that (hinges on timelines) is right. Other than the first, I think most of my suggestions come at minimal cost to short-timelines-world, and will help with minimising friction/reputational hit in long-timelines world. Re: the first, not delivering the strongest (and least hedged) version of argument may weaken the message for short-timelines world. But I note that even within this community, there is wide uncertainty and disagreement re: timelines; very short timelines are far from consensus.
I want to be clear for the record here that this is enormously wrong, and Greg Colbourn's advice should not be heeded unless someone else checks the facts/epistemics of his announcements, due to past issues with his calls for alarm.
I've detailed my reasoning in my posts. They are open for people to comment on them and address things at the object level. Please do so rather than cast aspersions.
Or at least link to these "past issues" you refer to.