In another post, I published four hypotheses on uncontrollable AI as an existential risk. Here I give some background on why I chose this particular framing.
Talking about existential risks from AI with people outside of the AI alignment or effective altruism communities can be quite frustrating. Often, what seems obvious to people in the community is met with deep skepticism by others, making a productive discussion difficult. As a result, the arguably most urgent problem we face today is almost completely neglected by politicians and the general scientific community.
I think there are many reasons for this, and I don’t claim to have analyzed them thoroughly, so there is more work to be done here. But from my own experience, there are some key factors that make it difficult for people to see that there is a real problem that needs to be addressed now. The most important one in my opinion is that existential risks from AI are usually linked to terms like “AGI”, “transformative AI (TAI)” or even “superintelligence”. This has several drawbacks.
First of all, these terms are quite vague, so it is easy for two people to have very different understandings of them. Also, vague problems are much easier to ignore than concrete ones.
Second, these terms (with the possible exception of TAI) are anthropomorphic: The human mind is defined as the ultimate benchmark of intelligence and it is implicitly assumed that as long as AIs are not “smarter than us”, there is no existential risk. Also, the expected timeline for AGI depends a lot on the individual view of how complex the human mind really is. If, for example, you believe that “consciousness” is in principle unachievable in a machine, you’ll probably think that AGI is impossible, or at least a very long way off, and therefore there is no reason to be concerned. Even if you think that consciousness can in principle be achieved in computers, you might equate “developing AGI” with “simulating the human brain” or at least “fully understanding the workings of the brain”. This is grossly misleading. While it seems plausible that AGI or superintelligence would indeed pose an existential risk, it is by no means clear that superiority to the human mind in most aspects is a necessary condition for that. In particular, “consciousness” in the way philosophers or psychologists usually understand it is probably not needed.
Third, AGI and superintelligence are often associated with science fiction, so it is easy to dismiss them as “not real”, especially in the absence of concrete proof that we’re close to developing AGI.
Fourth, there is an inherent conflict of interest: Leading labs like Deepmind and OpenAI are committed to developing AGI, so any attempt to ban it or even slow down research because it might pose an existential risk would likely be met with fierce opposition. This may also be true for decision-makers outside of AI who are committed to “free markets” and/or have a strongly positive view of technology in general. For this reason, people concerned about existential risks from AI are sometimes compared to Luddites or ridiculed, as Andrew Ng did with his famous comparison to “fear of overpopulation on Mars”.
To avoid these problems and foster a productive discussion about the benefits and risks of advanced AI, I propose that we talk about the risks of “uncontrollable AI” instead of AGI or superintelligence. By “uncontrollable”, I mean that the AI is able to counter most measures humans take to either limit its capabilities to act or correct its decisions. More details can be found in hypothesis 1 here. Apart from avoiding most of the problems mentioned above, “uncontrollable AI” is clearly a term that invites caution. Most people will see an “uncontrollable” technology as inherently bad and something to be avoided. I guess few AI developers would object to the claim that “uncontrollable AI would be a bad thing”.
Yampolskiy and others have given convincing arguments that any superintelligent AI would be uncontrollable. But it is not clear at all that to be uncontrollable, an AI has to be superintelligent. It may be sufficient that it is good enough at manipulating humans and/or technology in order to beat us at what I call the “dominance game”. It is currently unclear what exactly are necessary conditions for that, so here is a promising and important field for research.
It should be obvious that an uncontrollable AI pursuing the wrong goal would pose an existential threat. On the other hand, it may be possible that an uncontrollable AI pursuing the “right” goal could be beneficial to the future of humanity (although I am personally doubtful of that). However, in this case, the burden of proof clearly lies with the one proposing to develop such an AI. It seems reasonable, for example, to ban the development of uncontrollable AI unless it is provably beneficial. Even without a formal ban, a common global understanding among AI developers that uncontrollable AI is to be avoided under all circumstances, at least until the value alignment problem has been solved, would significantly reduce the risk of creating such a system.
By reframing the problem from AGI/superintelligence risks towards risks from uncontrollable AI, I hope that we’ll have a more open and productive discussion about the specific problems in AI development we need to avoid. It might enable us to research in more detail what exactly makes an AI uncontrollable, and where to draw “red lines” so that we can safely develop advanced AI.
I like the framing and would like to see more discussion around this! I've left a comment on this post that has some overlap.
Thanks for that!