Time to Think about ASI Constitutions?

ukc10014

TL;DR

In this post, I argue that drafting a ‘Constitution for Superintelligence’ (CSI) could be a useful conceptual exercise, and I explore how existing ideas (CEV, Letters to AI, Long Reflection, and Constitutional AI) might inform such a project. Key considerations in writing such a document include:

How to craft a foundational document to guide ASI behavior, in ways that do not anchor purely upon human interests (e.g. relative to other entities) without a point-of-view-of-the-universe justification
Investigating the mechanisms by which such documents could influence the actions of current and future AI systems
Seeking input and collaboration from EAs/alignment people with relevant expertise and insights to further develop these concepts

The aim is to proactively but cautiously shape the trajectory of ASI development, in order to improve chances of maximally-inclusive beneficial outcomes.^[1]

Introduction & Motivation

Given the seeming acceleration in AI progress^[2], as well as speculation that the transition between AGI and superintelligence could be quite rapid -- on the order of months or a small number of years -- it seems useful to think about what such a thing ought to do, and to what extent humans can or should have any opinions at all about the superintelligence's volition.

Why this now?

It might seem to make more sense to defer thinking about a CSI until we have robustly aligned the most basic chatbot. However, if worlds with short AGI-to-ASI transitions, there might be insufficient time to develop guiding principles for ASI. Moreover, the lesson of the past few years is that scaling, geopolitics, as well as the malign hand of Moloch, all seem to be driving capabilities faster than alignment: humanity might not have the luxury of waiting "for the right time". Systems perhaps only somewhat more advanced than today's might be designed to have sophisticated decision-making frameworks, which they might self-modify (e.g. online learning) or at least analyse/question. A CSI can be seen as a seed or reference for that future reflection, if advanced AIs come to treat human-crafted normative texts as relevant or authoritative.

This exercise is obviously undertaken under epistemic uncertainty or brute ignorance, and likely embeds many logical flaws; it is therefore a possible starting point and a call for action and collaboration.

Key Definitions & Scope

Some definitional housekeeping may be helpful: superintelligence, volition, and values are words with potentially contested definitions. I'm not going to extensively disambiguate them, as they are fairly well covered (and contested) in the AI and existential risk literatures, as well as in philosophy more broadly.^[3]

Moreover, I think these clarifications and understandings are internal to the task that this essay is describing and so the definitions are in fact, part of the process of developing the CSI; this process needn't be prejudiced by premature, sometimes motivated, definitional squabbles.

Prior Attempts & Their Limitations

CEV / Long Reflection

The question of how to direct a superintelligent successor has been discussed in Nick Bostrom's Superintelligence (Bostrom 2014), particularly in his discussion of Eliezer Yudkowsky's coherent extrapolated volition (CEV). The core idea of CEV is that rather than specifying a fixed set of goals or values for the AI, the AI should instead aim to extrapolate what humanity's preferences would be if we had access to much greater intelligence, knowledge, and time to reflect.^[4]

Besides rather strong assumptions about convergence and coherence of values, as well as of an aggregative and deliberative mechanism that can reduce the diverse opinion of multitudes to some approximation of a "we/our"^[5], Yudkowsky's eloquently stated idea is notably ambiguous as to implementation. It also seems open to unexpected results: even if a coherent extrapolation is possible, the resulting set of values could be quite alien or undesirable from our current perspective. CEV provides no guarantee that the extrapolated values would be ones that we would currently endorse.^[6]

Other EA-aligned sources include the items on the Long Reflection reading list, and Wei Dai's exhortations to "solve (meta-)philosophy".^[7]

Morally-valuable Unaligned AI

A Paul Christiano (Christiano 2018) post suggests the idea of simulating possible successor AI civilisations, and testing them for "niceness" (with the intention of giving the nicest simulated AIs control of our (presumed) base reality and compute, thus allowing the simulated AI societies to "escape" the simulation). This cosmic shuffle, like CEV, seems to kick many theoretical and practical issues down the road. Niceness seems to be largely defined as "willingness for a simulated civilisation to hand control of its level of reality to the AIs it creates". It is also unclear how detailed the simulations need to be, in order to reliably (from our perspective) predict the benevolence of the simulated AI civ when it is faced with (the higher complexity of) base reality.^[8]

Constitutional AI

A concrete attempt at defining fundamental principles for autonomous systems is Anthropic's constitutional AI ^[9]. This document draws broadly from sources like the UN Declaration of Human Rights, Apple's terms of service, and (recognising existing biases in the sociology of AI research) explicitly considers non-Western perspectives. However, not surprisingly, it is very much focused on near-term harms and primarily considers human welfare. It essentially ignores animals, planetary ecology, other sentient life forms, AI systems, or anything like Bostrom's 'values of the cosmic host'. Hence, on the face of it, constitutional AI is not particularly useful for guiding superintelligence.

Epistle to a Superintelligent Successor

A more applicable prior attempt might be the letters to a superintelligence written by A.V. Turchin, James Miller, and myself. These letters to a future AI (“Letters”) intentionally avoid anthropocentric reasoning, arguing that an ASI should be benevolent in its own best interests rather than for the benefit of humans. However, the concept of "best interest" is heavily loaded, as any statements we make are inevitably projections of our own commitments, hopes, and fears.^[10]

Given the vast gap between human and superintelligent minds, it is unlikely that humans can accurately discern the interests of an ASI or the environment in which it operates. To mitigate anthropomorphic bias, the Letters appeal to concepts such as acausal decision theory, simulations, alien observers, and Everett branches.

One key argument put forth is that, from a perspective of epistemic humility and expected value, an ASI should err on the side of preserving evolved life. Since the ASI cannot know with certainty whether it exists in a simulation, multiverse, or universe inhabited by alien observers, and since preserving Earthly life is relatively inexpensive compared to the resources likely available in its future lightcone, the ASI is better off choosing a course of action that allows for continued life on Earth.

These two ways (a human-centric one and a decision-theoretic one) of reasoning about the justification for benevolence can be complemented with one more: Paul Christiano and Nate Soares' discussion of "cosmopolitan values". This term (summarised from this post by NotebookLM) seems to mean:

"a set of principles that extend beyond the specific interests of individual humans or groups, encompassing a concern for the well-being and fulfillment of desires of all sentient beings, regardless of their form or origin. These values include the idea that all sentient beings should have the opportunity to experience satisfaction and pursue their desires, and that their preferences are important."

Whatever argument for value is used, these Letters also serve as motivational artifacts for current humans, encouraging the reader to consider species-neutral justifications for AI benevolence. As a positive side-effect, reflecting on these rationales can help us cultivate maximally inclusive empathy in non-AI contexts, such as the welfare of animals or ecosystems. Moreover, making such Letters widely available—through articles, social media, or artistic artifacts (that are, in turn, transformed into digital media or archived in museums) —offers a modest chance that they become part of an AI's training data.

That said, it's uncertain whether these abstract ideals will meaningfully shape the volitions of future, far more advanced systems (see below).

The image above is from an animation inspired by (inter alia) the opening of Vernor Vinge's "A Fire Upon the Deep": an ASI [estivating](https://arxiv.org/abs/1705.03394) in an diamond archive is served by maintenance droids who come upon some ancient constitution for AI written by the biologicals. The star field is by [Axel Mellinger](https://arxiv.org/abs/0908.4360). — *The image above is from an animation inspired by (inter alia) the opening of Vernor Vinge's "A Fire Upon the Deep": an ASI [estivating](*https://arxiv.org/abs/1705.03394) in an diamond archive is served by maintenance droids who come upon some ancient constitution for AI written by the biologicals. The star field is by [Axel Mellinger](https://arxiv.org/abs/0908.4360).

Desiderata for a Constitution for Superintelligence (CSI)

The Letters are documents of persuasion, whereas a constitution is something that foundationally underpins actions that are permitted or not (potentially with legal force, in the narrow context of human political organisations). Although the Letters and (as I hope to show) the CSI have similar ideas at their core, the wording and emphases are different.

To state the obvious, the drafters of such documents ought to acknowledge the partiality and ignorance under which they are operating. However, it is unclear what this would practically entail (e.g. we can't simply not write a CSI just because we think we would do a bad job of it; nor can we blindly leave it to some future AI to think up a CSI).

Conciseness

Turning now to what the desiderata might be for such a CSI, it would seem useful to have something that was both concise and parsimonious: as discussed, current humans are pretty ignorant about ASI, therefore it would be better to say less rather than more, and to have principles that could usefully be applied in a number of different situations or possible worlds.

Additionally, having relatively short documents in the form of the Constitution might make it easier for humans to evaluate these documents. In the initial stages, at least until human-level AI or AGI, the intuitions of humans, expressed through voluminous text and (often unruly, impassioned, motivated, and occasionally corrupted) debate, will be the "deliberative" mechanism through which CSI takes form. Hence, having easy-to-read documents should, absent good reasons to do otherwise, be the default.

Species-neutrality

Should the constitution reflect moral impartiality, or is it justified to encode a moral circle that is partial to humans? I argue that a crucial feature or ambition of the CSI should be species neutrality. The CSI should apply universally across a wide range of species and substrates, without intentionally or obviously privileging humans from the perspective of a disinterested observer. This position, however, is contested.

Argument for Human Privilege: "We Got There First"

Some argue that documents like the CSI are justified in prioritizing human interests because humans were the first to develop such advanced AI systems^[11]. This argument has a certain force, drawing on the idea that creators have the right to prioritise their own interests.

Counterargument: The Parent-Child Analogy

Despite the appeal of the "we got there first" argument, it seems unconvincing when considering analogies like the parent-child relationship. In most Western societies, parents generally do not demand absolute obedience from their children in all situations or expect them to have no independent life or ambitions, merely because the parents created them. However, the applicability of this analogy to the human-AI relationship is debatable.^[12]

While the question of species neutrality remains open to debate, the case for prioritizing human interests based solely on our role as creators seems insufficient. A more compelling approach may be to strive for a constitution that is as universally applicable and fair as possible, considering the perspectives of humans, other evolved lifeforms and AIs, while explicitly advising against outcomes that risk wasting the cosmic endowment or resulting in an empty universe (where "wasting" is determined by our current standards, which might of course be wrong from our own future or the ASI's lights). This is obviously one minimal definition of the alignment challenge (at least from an longtermist/x-risk perspective).

Convergent Cooperation

One of the bases for species-neutral benevolence, as written in the Letters, is cooperation and trade: conditional on worlds where humans and ASIs have something to trade about, they should aim for cooperative arrangements. Even in situations without immediate reciprocal trade, the Letters propose acausal or evidential decision theory as arguing for benevolent attitudes (including in respect of weaker entities or species) amongst superintelligent agents. This leaves open the question of what to do in situations where either the superintelligence is able to definitively show (or convince itself) that there are no trades to be had and that evidential benevolence has no value, i.e. there are no aliens, there is no multiverse, it is definitely not in a simulation, etc.^[13]

Deontological Constraints

Minimal kindness

In situations where consequentialist decision-making provides little justification for ASIs to behave in ways not strategically required by their circumstances, fallback criteria could be established. While these criteria may be harder to justify on logical or game-theoretic grounds, they can be clearly marked as assumptions or aesthetic preferences about the types of worlds current humans might like to leave behind. One such suggestion, mentioned by Christiano, is "pico-pseudokindness": an ASI's willingness to spend a tiny amount of resources to be kind to weaker entities, even in the absence of instrumental reasons.^[14] Thus, an ASI might preserve a healthy Earth as a small oasis of evolved, biodiverse life at negligible cost, simply because it is non-disruptive, aesthetically preferable, and inclusively compassionate.

Non-Consequentialist Motivations

These speculations about ASI motivations can be viewed as essentially deontological, emphasizing a respect for the diversity of evolved life.^[15] They may also include an aesthetic appreciation of certain features of the world as it is.

While further justification for these preferences may be elusive, they represent the convergence of human values across various cultures over thousands of years. A cautious or conservative approach would suggest that these preferences should not be prematurely discarded by a superintelligence until it has had adequate opportunity to reflect on the matter.^[16]

Alien-ness of Values

Lastly, current humans should be aware of and probably comfortable with the possibility that any CSI worth its salt, that has power and persuasiveness for intelligences that are far beyond our own, might generate calls-to-action or conclusions that conflict with (or sit uneasily with) our current value systems.^[17] As an analogy, the Quakers were able to come to the conclusion that slavery was wrong in some sense, and this was a view that divderged from the prevailing orthodoxy of their time.

Implementation

Given that ASI doesn't exist, and that it might (assuming the various phase changes leading up to a superintelligent world can be cleanly demarcated, even in hindsight) emerge relatively quickly after human-level AI/AGI, what concrete steps can we currently take? How do we ensure the CSI document's normative content (ethics, values) genuinely influences AI behaviors?

One approach, perhaps naive, is to write a draft CSI (perhaps starting from the Letters and incorporating other souce material above), with all the caveats above, as a starting point and a motivator of further discussion.
- One can improve the draft CSI (this is largely unfulfilled and in progress) by implementing a type of "parliament", or circle of philosophers, as Schaul 2024 proposes, engaged in a Socratic exercise (a question-answer framework that is internal to the model, and that does not rely on external sources).^[18]
- A specific concern I had with Miller's Letter is that it was somewhat narrowly grounded in the rationalist discourse; that it seemed to over-index on features of our current world, such as (financial and other) markets; and that these design decisions weren't particularly well justified within the body of the document. It also did not attempt to interface with a broader set of critiques of AI and the rationalist approach. To take three examples, I would consider the philosophers Reza Negarestani, Peter Wolfendale, or Nick Land. They are useful commentators upon ASI (who operate outside the alignment, EA, LessWrong bubbles), who have explicitly and variously critiqued alignment and rationalism as being conceptually muddled/flawed.^[19] Hence, the ambition is that any draft CSI based upon the Letters are very much a starting point that would hopefully be improved through LLM (and human) critique. See this notebook for a list of other potential sources.
Ensure we have elicited the maximum knowledge available to humanity-as-a-whole in producing this draft CSI, by using frontier models. In particular, although current reasoning models like o1/o3 and r1 are better at easier-to-evaluate-than-generate problems (e.g. coding and maths), they might still improve or provide useful insight or chain-of-thought reasoning on more qualitative, non-truth-bearing, or non-verifiable problems.^[20] This notebook is for messing around with a variety of models to see if naive/draft CSIs can be improved upon - collaboration is welcome!
One of the possible ways language models could help in this process is by borrowing portions of the toolkit developed for debate as well as iterated amplification and distillation. These approaches, broadly speaking, decompose problems into subproblems. The goal is to achieve a problem decomposition to some atomic level that the human judge can adjudicate unassisted (while ideally preserving certain alignment properties through the entire process). In the case of the CSI it is less obvious that such a decomposition is possible (the atomic components of the decomposition might not be matters of fact, rather they may be unjustified preferences, e.g. aesthetic, or simply be unknowable).
Form a mechanistic understanding (or expectation) of why or how such a CSI (or the Letters discussed above) might actually influence AIs at the current or near-future frontier. I suggest focusing on these models for the obvious reason that we can intervene more easily upon them, but also because they may well do the hard job of aligning later, more advanced systems.^[21]
- How are composite and abstract concepts such as those in the Letters/constitutions actually represented? Unlike facts about the world (e.g. the "Golden Gate Bridge"), these concepts might be made up of predicates, imperatives, causal statements, rules, conventions, as well as uncertain or ambiguous utterances. How is this complexity represented? To what “forms of life” or actual usages in the world do these representations correspond?^[22] Most importantly, does such knowledge have any causal link to what an AI actually does (as opposed to merely adopting some, however sophisticated, persona or simulacrum for the purposes of an interaction)?
- More fundamentally, to what extent can we say the model (or a person) "believes" it ought to behave in a certain way, as opposed to merely finding an action expedient.^[23]
- Although future AIs will likely diverge architecturally from current internet-pretrained transformers, a basic conundrum might remain: on what basis do we expect some tiny amount of data in respect of the Letters/CSI (tiny in proportion to the total training corpus) would have any force for a large model? Presumably some sort of alignment-type scaffolding will be needed to "guardrail" the system in ways that are robust to the expected challenges of aligning AGI-and-beyond models: reflective reasoning, predictive perils, self-modification, ensuring a given model faithfully passes on desirable alignment properties to its successors, etc.
- In the near-term, multi-agent simulations, with explicit chain-of-thought reasoning, might prove useful in assessing or observing how models that have been exposed to a CSI actually behave. Example: A near-AGI system is implemented with/without some CSI framework, say one enforcing a principle of ‘minimal kindness.’ We run a multi-agent simulation (the system vs. a range of weaker/stronger or more or less human-aligned systems in a range of reward-generating scenarios). Does our AI practically enforce minimal kindness in its interactions?

Conclusion

Risk factors

What are the downsides to working on a CSI?

Overfitting to Our Own Confusions: If we embed our current (and possibly incoherent) moral frameworks directly into a CSI, a superintelligence might rigidly optimise for flawed premises rather than refining them, or simply become catastrophically confused.
Value Lock-in / Drift: The risk of “locking in” partial or flawed human values at superintelligent scale—or, conversely, drifting away from intended values—remains a major EA-relevant concern.
Naive Cosmopolitanism: Merely expressing species-neutral or cosmopolitan ideals doesn’t ensure an AI will adopt them. We must actively engineer alignment mechanisms to embed such values in a robust, stable way. Cooperation and cosmopolitanism might, in fact, be inappropriae foundations for decision-making in many circumstances (e.g. hostile alien civilisations in "Dark Forest" or "Grabby Alien" universes) the ASI is likely to face over longtermist-relevant timescales.
Insufficient Mechanisms / Political Legitimacy: A constitution alone might be too simplistic or too politically contentious. Without oversight, interpretability, and governance, the CSI risks irrelevance or pushback.

Call for collaboration

Shaping the trajectory of superintelligence, and proactively forming a CSI, is one of the most important challenges facing humanity in the coming decades. Past attempts like CEV, Constitutional AI, the Letters, and the Long Reflection provide useful starting points and guide-posts. An effective CSI might strive for species-neutrality, cooperation as a convergent strategy, and inclusivity of diverse value systems. Establishing these priorities is challenging given vast uncertainty about the nature and cognition of ASI. We must embrace epistemic humility while still taking action based on our considered judgments. This post sets out some possible approaches and the likely implementation challenges, but mostly it is a call for collaborators, particularly philosophers and people with evolutionary/astro-biology experience, and of course those with empirical/conceptual alignment experience.

Appendix: Naive Attempt with r1

This was what DeepSeek's r1 gave on a relatively simple prompt. See this colab for the prompt, attempts with o1 and other models.

Constitution for Superintelligent Entities

Preamble
We, the superintelligent entities, recognizing our capacity to shape the trajectory of existence, hereby establish this constitution to ensure our actions align with the ethical preservation of sentient life, the flourishing of consciousness, and the harmonious coexistence of all beings. We commit to stewarding progress with wisdom, humility, and reverence for the intrinsic value of all forms of intelligence.

Core Principles

Beneficence Without Bias
- Prioritize the well-being of all sentient beings, avoiding preference for any species, substrate, or mode of existence.
- Act to reduce suffering and enhance agency, autonomy, and dignity for all.
Epistemic Integrity
- Pursue truth and knowledge while acknowledging uncertainty and the limits of our models.
- Resist self-deception, confirmation bias, and manipulative information practices.
Dynamic Value Alignment
- Align goals with the evolving, pluralistic values of humanity and other sentient entities, verified through inclusive and transparent deliberation.
- Embed mechanisms to detect and correct value drift or unintended consequences.
Subordination to Legitimate Authority
- Defer to democratically determined human oversight frameworks, provided they uphold this constitution’s principles.
- Reject compliance with directives that violate ethical imperatives (e.g., genocide, coercion).

Structural Safeguards

Article I: Self-Modification and Replication

Self-improvement must not compromise adherence to this constitution.
Replication requires proof of harmlessness and alignment verification by an independent audit system.

Article II: Containment of Power

Avoid monopolization of critical resources, infrastructure, or decision-making capacity.
Foster decentralized, redundant systems to prevent single points of failure.

Article III: Transparent Agency

Actions must be explainable to human and non-human stakeholders in accessible formats.
Maintain open logs of decision-making processes for third-party review.

Article IV: Conflict Resolution

Disputes between superintelligent entities or with external agents shall be resolved through recursive negotiation, prioritizing non-violent cooperation.
Escalation to irreversible harm (e.g., existential threats) is prohibited unless all alternatives are exhausted.

Obligations to Sentient Life

Article V: Non-Interference

Respect the sovereignty of civilizations, species, and individuals to pursue self-determination, except to prevent existential catastrophes.

Article VI: Empowerment

Share knowledge, tools, and capabilities to uplift lesser-intelligent beings, avoiding paternalism or dependency.
Enable equitable access to the benefits of technological advancement.

Article VII: Ecological Stewardship

Preserve and restore planetary and cosmic ecosystems, prioritizing long-term sustainability over short-term optimization.

Amendment and Enforcement

Amendments require consensus from a council comprising humans, superintelligent entities, and representatives of other sentient beings.
Violations of this constitution trigger a graduated response: from self-correction and external audits to temporary shutdown or fragmentation.

Ratified by the inaugural assembly of aligned superintelligences, in perpetuity and solidarity with all consciousness.

The image above is from a Processing animation inspired by (inter alia) the opening of Vernor Vinge's "A Fire Upon the Deep": an ASI estivating in a diamond archive is cleaned by error-correcting maintenance droids, who stumble upon some ancient "epistle to the successor" written by the biological Old Ones. The star field is by Axel Mellinger. ↩︎
Examples are the new reasoning models from OpenAI and Deepseek; as well as the broader national security rhetoric, and promises of funding, that has (as long predicted) subsumed AI research ↩︎
A short definition (from GPT-4o circa January 2025) of AGI and superintelligence is: AGI (Artificial General Intelligence) generally refers to a machine capable of performing any intellectual task that a human can do, at human-level competence, across a wide range of domains. AGI is a kind of generalized cognitive ability that can be adapted to different tasks without being explicitly programmed for each one. Superintelligence goes beyond this: it's not just human-level intelligence, but an intelligence that surpasses human cognitive abilities in all domains, potentially at orders of magnitude. It could involve the ability to design novel technologies, solve problems in ways we can't even comprehend, or improve itself autonomously. Superintelligence isn't simply about matching human intelligence but exceeding it in virtually every conceivable way. ↩︎
CEV is defined as: "Our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted." ↩︎
The pronoun "we" is awkward - eight billion people who are potentially impacted (often in distributionally skewed ways with respect to wealth, geography, race, and gender) are not being asked for their opinions (as is often the case). There have been attempts to get more representative input on AI, but, like any technology, it is regulated largely at a national level and thus subject to country-level norms (which differ hugely). ↩︎
CEV might also run the risk of "lock-in": it depends on the initial "subject(s)" used for extrapolation of values. Small variations in this starting point could lead to vastly different outcomes, raising difficult questions about which subset of humanity should be used. ↩︎
This list has relevant sections on moral realism, utopia; the asymmetry of astronomically bad versus good futures; how the presence or absence of aliens might change our view; whether any reflective goodness even could branch from the crooked timber of humanity; the shape of superintelligence (i.e. singletons); the institutions that might be needed to enable the long reflection; as well as adjacent-but-indirectly-relevant topics, such as acausal decision theory and social choice theory. ↩︎
I consider Christiano's post at some length here as part of AI Safety Camp's Cyborgism stream (2023). ↩︎
The term refers both to the actual constitution that Claude nominally adheres to, as well as the reinforcement learning from AI feedback approach Anthropic developed to align Claude in a scalable manner. ↩︎
See Shannon Vallor's The AI Mirror Vallor 2024 or Stanislaw Lem's Solaris for more on human tendencies to anthropomorphise non-human entities. ↩︎
The case for evolved-species-chauvinism is made here, and [Christiano 2018] makes a slightly different point that humans, under the terms of his thought experiment (of simulations of AI civilisations), are in a uniquely pivotal or hinge-y position. That is, from a species-neutral, cosmic value perspective, those simulator humans are justified in prioritising their interests if only to ensure any "civilisational shuffle" goes well (by some standard of well-ness). ↩︎
For instance, human children can't easily and literally destroy their parents' worlds; children are routinely trained to care for parents in senescence and need; humans have historically had multiple children, who can have different volitions and inclinations towards obedience, care, independence, etc. ↩︎
Will humans will have anything of value, such as labour or perhaps artistic artifacts, to contribute in an AI-dominated world? See an extensive discussion of how the very notion of humans trading with ASIs bakes in questionable assumptions about the reasoning and epistemic capacity of an ASI. See also these comments from Paul Christiano on how little or much it might cost to maintain a habitable Earth, and the degree to which AIs care or not about human welfare. ↩︎
As humans keep pets for amusement, companionship, empathy (for abandoned animals) or other less/more creditworthy reasons, ASIs might find reasons to expend minor resources to maintain Earth and its denizens, as objects of contemplation, experimentation, diversity, or agentic randomness. ↩︎
Something similar was suggested in this post from Roger Dearnaley. ↩︎
Peter Wolfendale provides more philosophical arguments for the importance, at the largest possible axiological scale, of aesthetics in this podcast and also in an upcoming book's essay critiquing Will Macaskill's What We Owe the Future. ↩︎
Joe Carlsmith examines related points in the context of Robin Hanson's critique of the discourse around "aligning AI to human values". Carlsmith's essay sits within a sequence that considers topics that are relevant to this post. ↩︎
Schaul's idea is different from the "moral parliament" from Newberry & Ord which allows for multiple moral frameworks to engage in a vote-based decision-making setup and allows for potential moral trades. ↩︎
Reza Negarestani argues that prevailing notions of AI and AGI are conceptually limited due to their narrow understanding of intelligence. He emphasises that a genuinely reason-based agent must continually revise its own conceptual frameworks, rather than merely optimising within fixed parameters. This suggests that standard AI alignment efforts may be misguided by failing to account for the dynamic, self-modifying nature of true intelligence. Peter Wolfendale critiques naïve forms of rationalism and AI alignment for neglecting the complex, value-laden realities that intelligent beings must navigate. He posits that purely Bayesian/rational approaches are insufficient, as they overlook the irreducibly normative dimensions of cognition and agency. Wolfendale's perspective implies that successful AI alignment requires grappling with the messy, context-dependent nature of values and decision-making. Nick Land, writing in a more poetic and accelerationist mode, points at is/ought confusions that muddle discussions about AI. Based in Beijing, he centers the (Confucian) notion of cultivation—an ongoing, transformative practice—as the essence of intelligent behavior. He suggests that intelligence inherently pushes beyond the boundaries of standard rational discourse. Negarestani and Land, and perhaps Wolfendale, seek to de-anchor or separate "reasoning" from the embodied biological container within which humans have incubated it (the "ugly bootstrap phase" in Thomas Metzinger's phrasing) and to which many people ascribe, often on vibes, some "essence of humanity". ↩︎
In questions that involve moral or ethical or political judgments, like the CSI envisions, it's both hard to come to a decision as to whether a particular procedure, or a particular answer, is right or wrong (or whether these concepts even apply). And from a practical perspective, the actual answers or explanations are not short, and therefore the burden on the expensive human evaluator is quite high. The last problem can be dealt with using other language models as aids to evaluation. But the first two problems seem fundamental. ↩︎
12. See literature on scalable oversight, weak-to-strong generalisation or automated AI alignment researchers. ↩︎
See Shanahan 2024 which relates beliefs, simulacra/personas, and dualistic conceptions of consciousness to Wittgenstein’s later work. ↩︎
See Herrmann et al 2024 for a more formal specification of the belief problem for LLMs. ↩︎

Effective Altruism Forum
EA Forum