What's your credence that humans create a utopia in the alternative? Depending on the strictness of one's definition, I think a future utopia is quite unlikely either way, whether we solve alignment or not.
It seems you expect future unaligned AIs will either be unconscious or will pursue goals that result in few positive conscious experiences being created. I am not convinced of this myself. At the very least, I think such a claim demands justification.
Given the apparent ubiquity of consciousness in the animal kingdom, and the anticipated sophistication of AI cognition, it is difficult for me to imagine a future essentially devoid of conscious life, even if that life is made of silicon and it does not share human preferences.
This argument only makes sense if you have a very low P(doom) (like <0.1%) or if you place minimal value on future generations. Otherwise, it's not worth recklessly endangering the future of humanity to bring utopia a few years (or maybe decades) sooner. The math on this is really simple—bringing AI sooner only benefits the current generation, but extinction harms all future generations. You don't need to be a strong longtermist, you just need to accord significant value to people who aren't born yet.
Here's a counter-argument that relies on the following assumptions:
First, suppose you believe unaligned AIs would still be conscious entities, capable of having meaningful, valuable experiences. This could be because you think unaligned AIs will be very cognitively sophisticated, even if they don't share human preferences.
Second, assume you're a utilitarian who doesn't assign special importance to whether the future is populated by biological humans or digital minds. If both scenarios result in a future full of happy, conscious beings, you’d view them as roughly equivalent. In fact, you might even prefer digital minds if they could exist in vastly larger numbers or had features that enhanced their well-being relative to biological life.
With those assumptions in place, consider the following dilemma:
If AI is developed soon, there’s some probability p that billions of humans will die due to misaligned AI—an obviously bad outcome. However, if these unaligned AIs replace us, they would presumably still go on to create a thriving and valuable civilization from a utilitarian perspective, even though humanity would not be part of that future.
If AI development is delayed by several decades to ensure safety, billions of humans will die in the meantime from old age who could otherwise have been saved by accelerated medical advancements enabled by earlier AI. This, too, is clearly bad. However, humanity would eventually develop AI safely and go on to build a similarly valuable civilization, just after a significant delay.
Given these two options, a utilitarian doesn't have strong reasons to prefer the second approach. While the first scenario carries substantial risks, it does not necessarily endanger the entire long-term future. Instead, the primary harm seems to fall on the current generation: either billions of people die prematurely due to unaligned AI, or they die from preventable causes like aging because of delayed technological progress. In both cases, the far future—whether filled with biological or digital minds—remains intact and flourishing under these assumptions.
In other words, there simply isn't a compelling utilitarian argument for choosing to delay AI in this dilemma.
Do you have any thoughts on the assumptions underlying this dilemma, or its conclusion?
TL;DR...
Restrictions to advanced AI would likely delay technological progress and potentially require a state of surveillance.
To be clear, I wasn't arguing against generic restrictions on advanced AIs. In fact, I advocated for restrictions, in the form of legal protections on AIs against abuse and suffering. In my comment, I was solely arguing against a lengthy moratorium, rather than arguing against more general legal rules and regulations.
Given my argument, I'd go further than saying that the relevant restrictions I was arguing against would "likely delay technological progress". They almost certainly would have that effect, since I was talking about a blanket moratorium, rather than more targeted or specific rules governing the development of AI (which I support).
I think what is missing for this argument to go through is arguing that the costs in 2 are higher than the cost of mistreated Artificial Sentience.
A major reason why I didn't give this argument was because I already conceded that we should have legal protections against mistreated Artificial Sentience. The relevant comparison is not between a scenario with no restrictions on mistreatment vs. restrictions that prevent against AI mistreatment, but rather between the moratorium discussed in the post vs. more narrowly scoped regulations that specifically protect AIs from mistreatment.
Let me put this another way. Let's say we were to impose a moratorium on advanced AI, for the reasons given in this post. The idea here is presumably that, during the moratorium, society will deliberate on what we should do with advanced AI. After this deliberation concludes, society will end the moratorium, and then implement whatever we decided on.
What types of things might we decide to do, while deliberating? A good guess is that, upon the conclusion of the moratorium, we could decide to implement strong legal protections against AI mistreatment. In that case, the result of the moratorium appears identical to the legal outcome that I had already advocated, except with one major difference: with the moratorium, we'd have spent a long time with no advanced AI.
It could well be the case that spending, say, 50 years with no advanced AI is always better than nothing—from a utilitarian point of view—because AIs might suffer on balance more than they are happy, even with strong legal protections. If that is the case, the correct conclusion to draw is that we should never build AI, not that we should spend 50 years deliberating. Since I didn't think this was the argument being presented, I didn't spend much time arguing against the premise supporting this conclusion.
Instead, I wanted to focus on the costs of delay and deliberation, which I think are quite massive and often overlooked. Given these costs, if the end result of the moratorium is that we merely end up with the same sorts of policies that we could have achieved without the delay, the moratorium seems flatly unjustified. If the result of the moratorium is that we end up with even worse policies, as a result of the cultural effects I talked about, then the moratorium is even less justified.
(I'm repeating something I said in another comment I wrote a few hours ago, but adapted to this post.)
On a basic level, I agree that we should take artificial sentience extremely seriously, and think carefully about the right type of laws to put in place to ensure that artificial life is able to happily flourish, rather than suffer. This includes enacting appropriate legal protections to ensure that sentient AIs are treated in ways that promote well-being rather than suffering. Relying solely on voluntary codes of conduct to govern the treatment of potentially sentient AIs seems deeply inadequate, much like it would be for protecting children against abuse. Instead, I believe that establishing clear, enforceable laws is essential for ethically managing artificial sentience.
That said, I'm skeptical that a moratorium is the best policy.
From a classical utilitarian perspective, the imposition of a lengthy moratorium on the development of sentient AI seems like it would help to foster a more conservative global culture—one that is averse towards not only creating sentient AI, but also potentially towards other forms of life-expanding ventures, such as space colonization. Classical utilitarianism is typically seen as aiming to maximize the number of conscious beings in existence, advocating for actions that enable the flourishing and expansion of life, happiness, and fulfillment on as broad a scale as possible. However, implementing and sustaining a lengthy ban on AI would likely require substantial cultural and institutional shifts away from these permissive and ambitious values.
To enforce a moratorium of this nature, societies would likely adopt a framework centered around caution, restriction, and a deep-seated aversion to risk—values that would contrast sharply with those that encourage creating sentient life and proliferating this life on as large of a scale as possible. Maintaining a strict stance on AI development might lead governments, educational institutions, and media to promote narratives emphasizing the potential dangers of sentience and AI experimentation, instilling an atmosphere of risk-aversion rather than curiosity, openness, and progress. Over time, these narratives could lead to a culture less inclined to support or value efforts to expand sentient life.
Even if the ban is at some point lifted, there's no guarantee that the conservative attitudes generated under the ban would entirely disappear, or that all relevant restrictions on artificial life would completely go away. Instead, it seems more likely that many of these risk-averse attitudes would remain even after the ban is formally lifted, given the initially long duration of the ban, and the type of culture the ban would inculcate.
In my view, this type of cultural conservatism seems likely to, in the long run, undermine the core aims of classical utilitarianism. A shift toward a society that is fearful or resistant to creating new forms of life may restrict humanity’s potential to realize a future that is not only technologically advanced but also rich in conscious, joyful beings. If we accept the idea of 'value lock-in'—the notion that the values and institutions we establish now may set a trajectory that lasts for billions of years—then cultivating a culture that emphasizes restriction and caution may have long-term effects that are difficult to reverse. Such a locked-in value system could close off paths to outcomes that are aligned with maximizing the proliferation of happy, meaningful lives.
Thus, if a moratorium on sentient AI were to shape society's cultural values in a way that leans toward caution and restriction, I think the enduring impact would likely contradict classical utilitarianism's ultimate goal: the maximal promotion and flourishing of sentient life. Rather than advancing a world with greater life, joy, and meaningful experiences, these shifts might result in a more closed-off, limited society, actively impeding efforts to create a future rich with diverse and conscious life forms.
(Note that I have talked mainly about these concerns from a classical utilitarian point of view. However, I concede that a negative utilitarian or antinatalist would find it much easier to rationally justify a long moratorium on AI.
It is also important to note that my conclusion holds even if one does not accept the idea of a 'value lock-in'. In that case, longtermists should likely focus on the near-term impacts of their decisions, as the long-term impacts of their actions may be impossible to predict. And I'd argue that a moratorium would likely have a variety of harmful near-term effects.)
Given your statement that "a 50-year delay in order to make this monumentally importance choice properly would seem to be a wise and patient decision by humanity", I'm curious if you have any thoughts on the comment I just wrote, particularly the part arguing against a long moratorium on creating sentient AI, and how this can be perceived from a classical utilitarian perspective.
On a basic level, I agree that we should take artificial sentience extremely seriously, and think carefully about the right type of laws to put in place to ensure that artificial life is able to happily flourish, rather than suffer. This includes enacting appropriate legal protections to ensure that sentient AIs are treated in ways that promote well-being rather than suffering. Relying solely on voluntary codes of conduct to govern the treatment of potentially sentient AIs seems deeply inadequate, much like it would be for protecting children against abuse. Instead, I believe that establishing clear, enforceable laws is essential for ethically managing artificial sentience.
However, it currently seems likely to me that sufficiently advanced AIs will be sentient by default. And if advanced AIs are sentient by default, then instituting a temporary ban on sentient AI development, say for 50 years, would likely be functionally equivalent to pausing the entire field of advanced AI for that period.
Therefore, despite my strong views on AI sentience, I am skeptical about the idea of imposing a moratorium on creating sentient AIs, especially in light of my general support for advancing AI capabilities.
The idea that sufficiently advanced AIs will likely be sentient by default can be justified by three basic arguments:
My skepticism of a general AI moratorium contrasts with those of (perhaps) most EAs, who appear to favor such a ban, for both AI safety reasons and to protect AIs themselves (as you argue here). I'm instead inclined to highlight the enormous costs of such a ban, compared to a variety of cheaper alternatives, such as targeted regulation that merely ensures AIs are strongly protected against abuse. These costs appear to include:
Moreover, from a classical utilitarian perspective, the imposition of a 50-year moratorium on the development of sentient AI seems like it would help to foster a more conservative global culture—one that is averse towards not only creating sentient AI, but also potentially towards other forms of life-expanding ventures, such as space colonization. Classical utilitarianism is typically seen as aiming to maximize the number of conscious beings in existence, advocating for actions that enable the flourishing and expansion of life, happiness, and fulfillment on as broad a scale as possible. However, implementing and sustaining a lengthy ban on AI would likely require substantial cultural and institutional shifts away from these permissive and ambitious values.
To enforce a moratorium of this nature, societies would likely adopt a framework centered around caution, restriction, and a deep-seated aversion to risk—values that would contrast sharply with those that encourage creating sentient life and proliferating this life on as large of a scale as possible. Maintaining a strict stance on AI development might lead governments, educational institutions, and media to promote narratives emphasizing the potential dangers of sentience and AI experimentation, instilling an atmosphere of risk-aversion rather than curiosity, openness, and progress. Over time, these narratives could lead to a culture less inclined to support or value efforts to expand sentient life.
Even if the ban is at some point lifted, there's no guarantee that the conservative attitudes generated under the ban would entirely disappear, or that all relevant restrictions on artificial life would completely go away. Instead, it seems more likely that many of these risk-averse attitudes would remain even after the ban is formally lifted, given the initially long duration of the ban, and the type of culture the ban would inculcate.
In my view, this type of cultural conservatism seems likely to, in the long run, undermine the core aims of classical utilitarianism. A shift toward a society that is fearful or resistant to creating new forms of life may restrict humanity’s potential to realize a future that is not only technologically advanced but also rich in conscious, joyful beings. If we accept the idea of 'value lock-in'—the notion that the values and institutions we establish now may set a trajectory that lasts for billions of years—then cultivating a culture that emphasizes restriction and caution may have long-term effects that are difficult to reverse. Such a locked-in value system could close off paths to outcomes that are aligned with maximizing the proliferation of happy, meaningful lives.
Thus, if a moratorium on sentient AI were to shape society's cultural values in a way that leans toward caution and restriction, I think the enduring impact would likely contradict classical utilitarianism's ultimate goal: the maximal promotion and flourishing of sentient life. Rather than advancing a world with greater life, joy, and meaningful experiences, these shifts might result in a more closed-off, limited society, actively impeding efforts to create a future rich with diverse and conscious life forms.
(Note that I have talked mainly about these concerns from a classical utilitarian point of view, and a person-affecting point of view. However, I concede that a negative utilitarian or antinatalist would find it much easier to rationally justify a long moratorium on AI.
It is also important to note that my conclusion holds even if one does not accept the idea of a 'value lock-in'. In that case, longtermists should likely focus on the near-term impacts of their decisions, as the long-term impacts of their actions may be impossible to predict. And my main argument here is that the near term impacts of such a moratorium are likely to be harmful in a variety of ways.)
Humans don't like shocks. Explosive growth would definitely be a shock. We tend to like very gradual changes, or brief flirts with big change.
Speaking generally, it is true that humans are frequently hesitant to change the status quo, and economic shocks can be quite scary to people. This provides one reason to think that people will try to stop explosive growth, and slow down the rate of change.
On the other hand, it's important to recognize the individual incentives involved here. On an individual, personal level, explosive growth is equivalent to a dramatic rise in real income over a short period of time. Suppose you were given the choice of increasing your current income by several-fold over the next few years. For example, if your real income is currently $100,000/year, then you would see it increase to $300,000/year in two years. Would you push back against this change? Would this rise in your personal income be too fast for your tastes? Would you try to slow it down?
Even if explosive growth is dramatic and scary on a collective and abstract level, it is not clearly bad on an individual level. Indeed, it seems quite clear to me that most people would be perfectly happy to see their incomes rise dramatically, even at a rate that far exceeded historical norms, unless they recognized a substantial and grave risk that would accompany this rise in their personal income.
If we assume that people collectively follow what is in each of their individual interests, then we should conclude that incentives are pretty strongly in favor of explosive growth (at least when done with low risk), despite the fact that this change would be dramatic and large.
In general, to me it seems quite fruitful to examine in more detail whether, in fact, multipolarity of various kinds might alleviate concerns about value fragility. And to those who have the intuition that it would (especially in cases, like Multipolar value fragility, where agent A’s exact values aren’t had by any of agents 1-n), I’d be curious to hear the case spelled out in more detail.
Here's a case that I roughly believe: multipolarity means that there's a higher likelihood that one's own values will be represented because it gives them the opportunity to literally live in, and act in the world to bring about outcomes they personally want.
This case is simple enough, and it's consistent with the ordinary multipolarity the world already experiences. Consider an entirely selfish person. Now, divide the world into two groups: the selfish person (which we call Group A) and the rest of the world (which we call Group B).
Group A and Group B have very different values, even "upon reflection". Group B is also millions or billions of times more powerful than Group A (as it comprises the entire world minus the selfish individual). Therefore, on a naive analysis, you might expect Group B to "take over the world" and then implement its values without any regard whatsoever to Group A. Indeed, because of the vast power differential, it would be "very easy" for Group B to achieve this world takeover. And such an outcome would indeed be very bad according to Group A's values.
Of course, this naive analysis is flawed, because the real world is multipolar in an important respect: usually, Group B will let Group A (the individual) have some autonomy, and let them receive a tiny fraction of the world's resources, rather than murdering Group A and taking all their stuff. They will do this because of laws, moral norms, and respect for one's fellow human. This multipolarity therefore sidesteps all the issues with value fragility, and allows Group A to achieve a pretty good outcome according to their values.
This is also my primary hope with misaligned AI. Even if misaligned AIs are collectively millions or billions of times more powerful than humans (or aligned AIs), I would hope they would still allow the humans or aligned AIs to have some autonomy, leave us alone, and let us receive a sufficient fraction of resources that we can enjoy an OK outcome, according to our values.
I would go even further than the position argued in this paper. This paper focuses on whether we should give agentic AIs certain legal rights (right to make contracts, hold property, and bring tort claims), but I also think as an empirical matter, we probably will do so. I have two main justifications for my position here:
Beyond the basic question of whether AIs should or will receive basic legal rights in the future, there are important remaining questions about how post-AGI law should be structured. For example:
I believe these questions, among others, deserve more attention among those interested in AGI governance.
I agree that this is the standard story regarding AI risk, but I haven’t seen convincing arguments that support this specific model.
In other words, I see no compelling evidence to believe that future AIs will have exclusively abstract, disconnected goals—like maximizing paperclip production—and that such AIs would fail to generate significant amounts of happiness, either as a byproduct of their goals or as an integral part of achieving them.
(Of course, it’s crucial to avoid wishful thinking. A favorable outcome is by no means guaranteed, and I’m not arguing otherwise. Instead, my point is that the core assumption underpinning this standard narrative seems weakly argued and poorly substantiated.)
The scenario I find most plausible is one in which AIs have a mixture of goals, much like humans. Some of these goals will likely be abstract, while others will be directly tied to the AI’s internal experiences and mental states.
Just as humans care about their own happiness but also care about external reality—such as the impact they have on the world or what happens after they’re dead—I expect that many AIs will place value on both their own mental states and various aspects of external reality.
This ultimately depends on how AIs are constructed and trained, of course. However, as you mentioned, there are some straightforward reasons to anticipate parallels between how goals emerge in animals and how they might arise in AIs. For example, robots and some other types of AIs will likely be trained through reinforcement learning. While RL on computers isn’t identical to the processes by which animals learn, it is similar enough in critical ways to suggest that these parallels could have significant implications.