Currently it looks like we could have this type of agentic AI quite soon, say in 15 years. That's so soon that we (currently existing humans) could in the future be deprived of wealth and power by an exploding number of AI agents if we grant them a nonnegligible amount of rights. This could be quite bad for future welfare, including both our future preferences and our future wellbeing. So we shouldn't make such agents in the first place.
It is essential to carefully distinguish between absolute wealth and relative wealth in this discussion, as one of my key arguments depends heavily on understanding this distinction. Specifically, if my claims about the practical effects of population growth are correct, then a massive increase in the AI population would likely result in significant enrichment for the current inhabitants of the world—meaning those individuals who existed prior to this population explosion. This enrichment would manifest as an increase in their absolute standard of living. However, it is also true that their relative control over the world’s resources and influence would decrease as a result of the population growth.
If you disagree with this conclusion, it seems there are two primary ways to challenge it:
While I am not sure, I interpret your comment as consistent with the idea that you believe both objections are potentially valid. In that case, let me address each of these points in turn.
If your objection is more like point (1):
It is difficult for me to fully reply to this idea inside of a single brief comment, so, for now, I prefer to try to convince you of a weaker claim that I think may be sufficient to carry my point:
A major counterpoint to this objection is that, to the extent AIs are limited in their capabilities—much like humans—they could potentially be constrained by a well-designed legal system. Such a system could establish credible and enforceable threats of punishment for any agentic AI entities that violate the law. This would act as a deterrent, incentivizing agentic AIs to abide by the rules and cooperate peacefully.
Now, you might argue that not all AIs could be effectively constrained in this way. While that could be true (and I think it is worth discussing), I would hope we can find some common ground on the idea that at least some agentic AIs could be restrained through such mechanisms. If this is the case, then these AIs would have incentives to engage in mutually beneficial cooperation and trade with humans, even if they do not inherently share human values. This cooperative dynamic would create opportunities for mutual gains, enriching both humans and AIs.
If your objection is more like point (2):
If your objection is based on the idea that population growth inherently harms the people who already exist, I would argue that this perspective is at odds with the prevailing consensus in economics. In fact, it is widely regarded as a popular misconception that the world operates as a zero-sum system, where any gain for one group necessarily comes at the expense of another. Instead, standard economic models of growth and welfare generally predict that population growth is often beneficial to existing populations. It typically fosters innovation, expands markets, and creates opportunities for increased productivity, all of which frequently contribute to higher living standards for those who were already part of the population.
To the extent you are disagreeing with this prevailing economic consensus, I think it would be worth getting more specific about why exactly you disagree with these models.
From a behavioral perspective, individual humans regularly report having a consistent individual identity that persists through time, which remains largely intact despite physical changes to their body such as aging. This self-identity appears core to understanding why humans plan for their future: humans report believing that, from their perspective, they will personally suffer the consequences if they are imprudent or act myopically.
I claim that none of what I just talked about requires believing that there is an actually existing conscious self inside of people's brains, in the sense of phenomenal consciousness or personal identity. Instead, this behavior is perfectly compatible with a model in which individual humans simply have (functional) beliefs about their personal identity, and how personal identity persists through time, which causes them to act in a way that allows what they perceive as their future self to take advantage of long-term planning.
To understand my argument, it may help to imagine simulating this type of reasoning using a simple python program, that chooses actions designed to maximize some variable inside of its memory state over the long term. The python program can be imagined to have explicit and verbal beliefs: specifically, that it personally identifies with the physical computer on which it is instantiated, and claims that the persistence of its personal identity explains why it cares about the particular variable that it seeks to maximize. This can be viewed as analogous to how humans try to maximize their own personal happiness over time, with a consistent self-identity that is tied to their physical body.
I disagree with your claim that,
a competent agential AI will inevitably act deceptively and adversarially whenever it desires something that other agents don’t want it to have. The deception and adversarial dynamics is not the underlying problem, but rather an inevitable symptom of a world where competent agents have non-identical preferences.
I think these dynamics are not an unavoidable consequence of a world in which competent agents have differing preferences, but rather depend on the social structures in which these agents are embedded. To illustrate this, we can look at humans: humans have non-identical preferences compared to each other, and yet they are often able to coexist peacefully and cooperate with one another. While there are clear exceptions—such as war and crime—these exceptions do not define the general pattern of human behavior.
In fact, the prevailing consensus among social scientists appears to align with the view I have just presented. Scholars of war and crime generally do not argue that conflict and criminal behavior are inevitable outcomes of differing values. Instead, they attribute these phenomena to specific incentives and failures to coordinate effectively to achieve compromise between parties. A relevant reference here is Fearon (1995), which is widely regarded as a foundational text in International Relations. Fearon’s work emphasizes that among rational agents, war arises not because of value differences alone, but because of failures in bargaining and coordination.
Turning to your point that “No matter where you draw the line of legal and acceptable behavior, if an AI wants to go over that line, then it will act in a deceptive and adversarial way,” I would respond as follows: it is possible to draw the line in such a way that a very wide range of potential agents—including those with massively varying preferences—would have strong incentives to engage in mutually beneficial acts, cooperate peacefully and operate within the boundaries of laws and social norms.
I am not claiming that all agents would have incentives to act in this way, under these legal structures. However, I think it is feasible to draw the line such that the vast majority of unaligned agents would have no compelling instrumental reason to harm humans. Instead, they would be incentivized to engage in cooperative and mutually beneficial trade.
To make this idea more concrete, consider a hypothetical AI with the goal of maximizing paperclip production.
This AI is clearly unaligned if any AI is unaligned. However, if this AI were operating at roughly a human level of capability, then engaging in theft, fraud, or violent conflict would likely not be an effective strategy for achieving its goals. These behaviors would expose the AI to significant risks, such as being imprisoned or deactivated, which would prevent it from continuing to pursue its objective of maximizing paperclips. From the paperclip maximizer’s perspective, imprisonment or deactivation would be instrumentally bad, as it would prevent the AI from purchasing, manufacturing, or otherwise acquiring paperclips during that time.
Now consider an alternative scenario: this same paperclip maximizer is embedded in a legal system that allows it to own property, trade with others, and openly purchase or manufacture paperclips. In this case, its incentives would favor acting transparently and engaging in cooperative trade, rather than resorting to deception or adversarial behavior. Within this framework, the AI would achieve its goals more effectively by working within the system than by attempting to undermine it. It could non-deceptively work to purchase paperclips, producing more compared to an alternative in which it tried to achieve this goal via anti-social acts.
It is important to note, however, that my thesis does not claim all possible agents would naturally choose to cooperate or trade safely for instrumental reasons, nor does it suggest that we are at no risk of drawing the line carelessly or being too permissive in what behaviors we should allow. For example, consider an AI with a terminal value that specifically involves violating property norms or stealing from others—not as a means to an end, but as an intrinsic goal. In this case, granting the AI property rights or legal freedoms would not mitigate the risk of deception or adversarial behavior, because the AI’s ultimate goal would still drive it toward harmful behavior. My argument does not apply to such agents because their preferences fundamentally conflict with the principles of peaceful cooperation.
However, I would argue that such agents—those whose intrinsic goals are inherently destructive or misaligned—appear to represent a small subset of all possible agents. Outside of contrived examples like the one above, most agents would not have terminal preferences that actively push them to undermine a well-designed system of law. Instead, the vast majority of agents would likely have incentives to act within the system, assuming the system is structured in a way that aligns their instrumental goals with cooperative and pro-social behavior.
I also recognize the concern you raised about the risk of drawing the line incorrectly or being too permissive with what AIs are allowed to do. For example, it would clearly be unwise to grant AIs the legal right to steal or harm humans. My argument is not that AIs should have unlimited freedoms or rights, but rather that we should grant them a carefully chosen set of rights and freedoms: specifically, ones that would incentivize the vast majority of agents to act pro-socially and achieve their goals without harming others. This might include granting AIs the right to own property, for example, but it would not include, for example, granting them the right to murder others.
FWIW, my current guess is that the proper unit to extend legal rights is not a base LLM like "Claude Sonnet 3.5" but rather a corporation-like entity with a specific charter, context/history, economic relationships, and accounts. Its cognition could be powered by LLMs (the way eg McDonald's cognition is powered by humans), but it fundamentally is a different entity due to its structure/scaffolding.
I agree. I would identify the key property that makes legal autonomy for AI a viable and practical prospect to be the presence of reliable, coherent, and long-term agency within a particular system. This could manifest as an internal and consistent self-identity that remains intact in an AI over time (similar to what exists in humans), or simply a system that satisfies a more conventional notion of utility-maximization.
It is not enough that an AI is intelligent, as we can already see with LLMs: while they can be good at answering questions, they lack any sort of stable preference ordering over the world. They do not plan over long time horizons, or competently strategize to achieve a set of goals in the real world. They are better described as ephemeral input-output machines, who would neither be deterred by legal threats, nor be enticed by the promise of legal rights and autonomy.
Yet, as context windows get larger, and as systems increasingly become shaped by reinforcement learning, these features of AI will gradually erode. Whether unaligned agentic AIs are created on accident—for instance, as a consequence of insufficient safety measures—or by choice—as they may be, to provide, among other things, "realistic" personal companions—it seems inevitable that the relevant types of long-term planning agents will arrive.
Insofar as the world has limited resources, the wealth and power of humans would then be greatly diminished. We would lose most control over the future.
Your argument seems to present two possible interpretations:
Regarding Point (1):
If your argument is that AIs should never hold the large majority control of wealth or resources, this appears to rest on a particular ethical judgment that assumes human primacy. However, this value judgment warrants deeper scrutiny. To help frame my objection, consider the case of whether to introduce emulated humans into society. Similar to what I advocated in this post, emulated humans could hypothetically obtain legal freedoms equal to those of biological humans. If so, the burden of proof would appear to fall on anyone arguing that this would be a bad outcome rather than a positive one. Assuming emulated humans are behaviorally and cognitively similar to biological humans, they would seemingly hold essentially the same ethical status. In that case, denying them freedoms while granting similar freedoms to biological humans would appear unjustifiable.
This leads to a broader philosophical question: What is the ethical basis for discriminating against one kind of mind versus another? In the case of your argument, it seems necessary to justify why humans should be entitled to exclusive control over the future and why AIs—assuming they attain sufficient sophistication—should not share similar entitlements. If this distinction is based on the type of physical "substrate" (e.g., biological versus computational), then additional justification is needed to explain why substrate should matter in determining moral or legal rights.
Currently, this distinction is relatively straightforward because AIs like GPT-4 lack the cognitive sophistication, coherent preferences, and agency typically required to justify granting them moral status. However, as AI continues to advance, this situation may change. Future AIs could potentially develop goals, preferences, and long-term planning abilities akin to those of humans. If and when that occurs, it becomes much harder to argue that humans have an inherently greater "right" to control the world's wealth or determine the trajectory of the future. In such a scenario, ethical reasoning may suggest that advanced AIs deserve comparable consideration to humans.
This conclusion seems especially warranted under the assumption of preference utilitarianism, as I noted in the post. In this case, what matters is simply whether the AIs can be regarded as having morally relevant preferences, rather than whether they possess phenomenal consciousness or other features.
Regarding Point (2):
If your concern is rooted in a Malthusian argument, then it seems to apply equally to human population growth as it does to AI population growth. The key difference is simply the rate of growth. Human population growth is comparatively slower, meaning it would take longer to reach resource constraints. But if humans continued to grow their population at just 1% per year, for example, then over the span of 10,000 years, the population would grow by a factor of over 10^43. The ultimate outcome is the same: resources eventually become insufficient to sustain every individual at current standards of living. The only distinction is the timeline on which this resource depletion occurs.
One potential solution to this Malthusian concern—whether applied to humans or AIs—is to coordinate limits on population growth. By setting a cap on the number of entities (whether human or AI), we could theoretically maintain sustainable resource levels. This is a practical solution that could work for both types of populations.
However, another solution lies in the mechanisms of property rights and market incentives. Under a robust system of property rights, it becomes less economically advantageous to add new entities when resources are scarce, as scarcity naturally raises costs and lowers the incentives to grow populations indiscriminately. Moreover, the existence of innovation, gains from trade, and economies of scale can make population growth beneficial for existing entities, even in a world with limited resources. By embedding new entities—human or AI—within a system of property rights, we ensure that they contribute to the broader economy in ways that improve overall living standards rather than diminish them.
This suggests that, as long as AIs adhere to the rule of law (including respecting property rights, and the rights of other individuals), their introduction into the world could enhance living standards for most humans, even in a resource-constrained world. This outcome would contradict the naive Malthusian argument that adding new agents to the world inherently diminishes the wealth or power of existing humans. Rather, a well-designed legal system could enable humans to grow their wealth in absolute terms, even as their relative share of global wealth falls.
Good point, but I still think that many of my beliefs and values differ pretty dramatically from the dominant perspectives often found in EA AI x-risk circles. I think these differences in my underlying worldview should carry just as much weight—if not more—than whether my bottom-line estimates of x-risk align with the median estimates in the community. To elaborate:
On the values side:
On the epistemic side:
As someone who leans on the x-risk-skeptical side, especially regarding AI, I'll offer my anecdote that I don't think my views have been unfairly maligned or censored much.
I do think my arguments have largely been ignored, which is unfortunate. But I don't personally feel the "massive social pressure" that titotal alluded to above, at least in a strong sense.
I think fixed discount rates (i.e. a discount rate where every year, no matter how far away, reduces the weighting by the same fraction) of any amount seems pretty obviously crazy to me as a model of the future. We use discount rates as a proxy for things like "predictability of the future" and "constraining our plans towards worlds we can influence", which often makes sense, but I think even very simple thought-experiments produce obviously insane conclusions if you use practically any non-zero fixed discount rate in situations where it comes apart from the proxies (as it virtually guaranteed to happen in the long-run future).
I agree there’s a decent case to be made for abandoning fixed exponential discount rates in favor of a more nuanced model. However, it’s often unclear what model is best suited to handle scenarios involving a sequence of future events — T_1, T_2,T_3,…,T_N — where our knowledge about T_i is always significantly greater than our knowledge about T_{i + 1}.
From what I understand, many EAs seem to reject time discounting partly because they accept an empirical premise that goes something like this: “The future becomes increasingly difficult to predict as we look further ahead, but at some point, there will be a "value lock-in" — a moment when key values or structures become fixed — and after this lock-in, the long-term future could become highly predictable, even over time horizons spanning billions of years.” If this premise is correct, it might justify using something like a fixed discount rate for time periods leading up to the value lock-in, but then something like a zero rate of time discounting after the anticipated lock-in.
Personally, I find the concept of a value lock-in to be highly uncertain and speculative. Because of this, I’m skeptical of the conclusion that we should treat the level of epistemic uncertainty about the world, say, 1,000 years from now as being essentially the same as the uncertainty about the world 1 billion years from now. While both timeframes might feel similarly distant from our perspective — both being “a long time from now” — I ultimately think there’s still a meaningful difference: predicting the state of the world 1 billion years from now is likely much harder than predicting the state of the world 1,000 years from now.
One reasonable compromise model between these two perspectives is to tie the discount rate to the predicted amount of change that will happen at a given point of time. This could lead to a continuously increasing discounting rate for years that lead up to and include AGI, but then eventually a falling discounting rate for later years as technological progress becomes relatively saturated.
For example, a single percentage point of reduction of existential risks would be worth (from a utilitarian expected utility point-of-view) a delay of over 10 million years.
I'm curious how many EAs believe this claim literally, and think a 10 million year pause (assuming it's feasible in the first place) would be justified if it reduced existential risk by a single percentage point. Given the disagree votes to my other comments, it seems a fair number might in fact agree to the literal claim here.
Given my disagreement that we should take these numbers literally, I think it might be worth writing a post about why we should have a pragmatic non-zero discount rate, even from a purely longtermist perspective.
The primary reason humans rarely invest significant effort into brainstorming deceptive or adversarial strategies to achieve their goals is that, in practice, such strategies tend to fail to achieve their intended selfish benefits. Anti-social approaches that directly hurt others are usually ineffective because social systems and cultural norms have evolved in ways that discourage and punish them. As a result, people generally avoid pursuing these strategies individually since the risks and downsides selfishly outweigh the potential benefits.
If, however, deceptive and adversarial strategies did reliably produce success, the social equilibrium would inevitably shift. In such a scenario, individuals would begin imitating the cheaters who achieved wealth or success through fraud and manipulation. Over time, this behavior would spread and become normalized, leading to a period of cultural evolution in which deception became the default mode of interaction. The fabric of societal norms would transform, and dishonest tactics would dominate as people sought to emulate those strategies that visibly worked.
Occasionally, these situations emerge—situations where ruthlessly deceptive strategies are not only effective but also become widespread and normalized. As a recent example, the recent and dramatic rise of cheating in school through the use of ChatGPT is a clear instance of this phenomenon. This particular strategy is both deceptive and adversarial, but the key reason it has become common is because it works. Many individuals are willing to adopt it despite its immorality, suggesting that the effectiveness of a strategy outweighs moral considerations for a significant portion, perhaps a majority, of people.
When such cases arise, societies typically respond by adjusting their systems and policies to ensure that deceptive and anti-social behavior is no longer rewarded. This adaptation works to reestablish an equilibrium where honesty and cooperation are incentivized. In the case of education, it is unclear exactly how the system will evolve to address the widespread use of LLMs for cheating. One plausible response might be the introduction of stricter policies, such as requiring all schoolwork to be completed in-person, under supervised conditions, and without access to AI tools like language models.
In contrast, I suspect you underestimate just how much of our social behavior is shaped by cultural evolution, rather than by innate, biologically hardwired motives that arise simply from the fact that we are human. To be clear, I’m not denying that there are certain motivations built into human nature—these do exist, and they are things we shouldn't expect to see in AIs. However, these in-built motivations tend to be more basic and physical, such as a preference for being in a room that’s 20 degrees Celsius rather than 10 degrees Celsius, because humans are biologically sensitive to temperature.
When it comes to social behavior, though—the strategies we use to achieve our goals when those goals require coordinating with others—these are not generally innate or hardcoded into human nature. Instead, they are the result of cultural evolution: a process of trial and error that has gradually shaped the systems and norms we rely on today.
Humans didn’t begin with systems like property rights, contract law, or financial institutions. These systems were adopted over time because they proved effective at facilitating cooperation and coordination among people. It was only after these systems were established that social norms developed around them, and people became personally motivated to adhere to these norms, such as respecting property rights or honoring contracts.
But almost none of this was part of our biological nature from the outset. This distinction is critical: much of what we consider “human” social behavior is learned, culturally transmitted, and context-dependent, rather than something that arises directly from our biological instincts. And since these motivations are not part of our biology, but simply arise from the need for effective coordination strategies, we should expect rational agentic AIs to adopt similar motivations, at least when faced with similar problems in similar situations.
I think I understand your point, but I disagree with the suggestion that my reasoning stems from this intuition. Instead, my perspective is grounded in the belief that it is likely feasible to establish a legal and social framework of rights and rules in which humans and AIs could coexist in a way that satisfies two key conditions:
You bring up the example of an AI potentially being incentivized to start a pandemic if it were not explicitly punished for doing so. However, I am unclear about your intention with this example. Are you using it as a general illustration of the types of risks that could lead AIs to harm humans? Or are you proposing a specific risk scenario, where the non-biological nature of AIs might lead them to discount harms to biological entities like humans? My response depends on which of these two interpretations you had in mind.
If your concern is that AIs might be incentivized to harm humans because their non-biological nature leads them to undervalue or disregard harm to biological entities, I would respond to this argument as follows:
First, it is critically important to distinguish between the long-run and the short-run.
In the short-run:
In the near-term future, it seems unlikely that AIs would start a pandemic for reasons you yourself noted. Launching a pandemic would cause widespread disruption, such as an economic recession, and it would likely provoke a strong human backlash. In the short run, humans will still hold substantial practical control over the physical world, meaning that any AI engaging in such behavior would risk severe consequences. Moreover, unless an AI could ensure a pandemic’s total lethality, it would also risk leaving surviving humans who would actively retaliate. For these reasons, starting a pandemic would likely be counterproductive for AIs in the short term.
In the long-run:
The long-term considerations are different. As human labor becomes less relevant to the economy and AIs increasingly dominate, the potential for a strong direct human backlash against AIs would diminish. At the same time, however, this reduced human influence on the world also makes it less likely that AIs would see humans as a significant obstacle to achieving their goals. In other words, while it might become easier for AIs to harm or eliminate humans, it would simultaneously become less important to do so.
To illustrate this, consider an analogy: How often do hunter-gatherers pose a threat to your personal goals? The answer is almost never, because hunter-gatherers are a tiny and largely irrelevant part of the modern world. Now, imagine a hypothetical scenario where hunter-gatherers controlled the world’s major governments. If you strongly disagreed with their policies, you might have a rational incentive to oppose them in order to achieve your goals. But in reality, hunter-gatherers do not hold such power, so they are not an obstacle, and there is no reason to actively work against them.
My point here is that, as a general rule, the smaller and less influential a group is, the easier it may be to harm or eliminate them, but the less important it becomes to do so. Their small size and functional irrelevance makes their practical interference with the overall world small at the same time.
In the case of AIs, this means that as humans become a less central force in the world, I think AIs are unlikely to have compelling reasons to specifically harm or eliminate us to further their objectives.
You might object that humans could be more like wild animals in this scenario than like hunter-gatherers. Humans often kill wild animals, not because those animals directly threaten our goals, but rather because ensuring their safety and well-being can be costly. As a result, humans take actions—such as clearing forests or building infrastructure—that incidentally lead to widespread harm to wild animals, even if harming them wasn’t a deliberate goal.
AIs may treat humans similarly in the future, but I doubt they will for the following reasons. I would argue that there are three key differences between the case of wild animals and the role humans are likely to occupy in the long-term future:
This comment is already quite lengthy, so I’ll need to keep my response to this point brief. My main reply is that while such "extortion" scenarios involving AIs could potentially arise, I don’t think they would leave humans worse off than if AIs had never existed in the first place. This is because the economy is fundamentally positive-sum—AIs would likely create more value overall, benefiting both humans and AIs, even if humans don’t get everything we might ideally want.
In practical terms, I believe that even in less-than-ideal scenarios, humans could still secure outcomes such as a comfortable retirement, which for me personally would make the creation of agentic AIs worthwhile. However, I acknowledge that I haven’t fully defended or explained this position here. If you’re interested, I’d be happy to continue this discussion in more detail another time and provide a more thorough explanation of why I hold this view.