Full version on arXiv | X
Executive summary
AI risk scenarios usually portray a relatively sudden loss of human control to AIs, outmaneuvering individual humans and human institutions, due to a sudden increase in AI capabilities, or a coordinated betrayal. However, we argue that even an incremental increase in AI capabilities, without any coordinated power-seeking, poses a substantial risk of eventual human disempowerment. This loss of human influence will be centrally driven by having more competitive machine alternatives to humans in almost all societal functions, such as economic labor, decision making, artistic creation, and even companionship.
A gradual loss of control of our own civilization might sound implausible. Hasn't technological disruption usually improved aggregate human welfare? We argue that the alignment of societal systems with human interests has been stable only because of the necessity of human participation for thriving economies, states, and cultures. Once this human participation gets displaced by more competitive machine alternatives, our institutions' incentives for growth will be untethered from a need to ensure human flourishing. Decision-makers at all levels will soon face pressures to reduce human involvement across labor markets, governance structures, cultural production, and even social interactions. Those who resist these pressures will eventually be displaced by those who do not.
Still, wouldn't humans notice what's happening and coordinate to stop it? Not necessarily. What makes this transition particularly hard to resist is that pressures on each societal system bleed into the others. For example, we might attempt to use state power and cultural attitudes to preserve human economic power. However, the economic incentives for companies to replace humans with AI will also push them to influence states and culture to support this change, using their growing economic power to shape both policy and public opinion, which will in turn allow those companies to accrue even greater economic power.
Once AI has begun to displace humans, existing feedback mechanisms that encourage human influence and flourishing will begin to break down. For example, states funded mainly by taxes on AI profits instead of their citizens' labor will have little incentive to ensure citizens' representation. This could occur at the same time as AI provides states with unprecedented influence over human culture and behavior, which might make coordination amongst humans more difficult, thereby further reducing humans' ability to resist such pressures. We describe these and other mechanisms and feedback loops in more detail in this work.
Though we provide some proposals for slowing or averting this process, and survey related discussions, we emphasize that no one has a concrete plausible plan for stopping gradual human disempowerment and methods of aligning individual AI systems with their designers' intentions are not sufficient. Because this disempowerment would be global and permanent, and because human flourishing requires substantial resources in global terms, it could plausibly lead to human extinction or similar outcomes.
Do you have any thoughts on the argument I recently gave that gradual and peaceful human disempowerment could be a good thing from an impartial ethical perspective?
Historically, it is common for groups to decline in relative power as a downstream consequence of economic growth and technological progress. As a chief example, the aristocracy declined in influence as a consequence of the industrial revolution. Yet this transformation is generally not considered a bad thing for two reasons. Firstly, since the world is not zero sum, individual aristocrats did not necessarily experience declining well-being despite the relative disempowerment of their class as a whole. Secondly, the world does not merely consist of aristocrats, but rather contains a multitude of moral patients whose agency deserves respect from the perspective of an impartial utilitarian. Specifically, non-aristocrats were largely made better off in light of industrial developments.
Applying this analogy to the present situation with AI, my argument is that even if AIs pursue separate goals from humans and increase in relative power over time, they will not necessarily make individual humans worse off, since the world is not zeros sum. In other words, there is ample opportunity for peaceful and mutually beneficial trade with AIs that do not share our utility functions, which would make both humans and AIs better off. Moreover, AIs themselves may be moral patients whose agency should be given consideration. Just as most of us think it is good that human children are allowed to grow, develop into independent people, and pursue their own goals—as long as this is done peacefully and lawfully—agentic AIs should be allowed to do the same. There seems to be a credible possibility of a flourishing AI civilization in the future, even if humans are relatively disempowered, and this outcome could be worth pushing for.
From a preference utilitarian perspective, it is quite unclear that we should prioritize human welfare at all costs. The boundary between biological minds and silicon-based minds seems quite arbitrary from an impartial point of view, making it a fragile foundation for developing policy. There are much more plausible moral boundaries—such as the distinction between sentient minds and non-sentient minds—which do not cut cleanly between humans and AIs. Therefore, framing the discussion solely in terms of human disempowerment seems like a mistake to me.
What would humans have to offer AIs for trade in this scenario, where there are "more competitive machine alternatives to humans in almost all societal functions"?
What do these words even mean in an ASI context? If humans are relatively disempowered, this would also presumably extend to the use of force and legal contexts.