In footnote 14 you say: “It has also been suggested (Sandberg et al 2016, Ord 2021) that the ultimate physical limits may be set by a civilisation that expands to secure resources but doesn’t use them to create value until much later on, when the energy can be used more efficiently. If so, one could tweak the framework to model this not as a flow of intrinsic value over time, but a flow of new resources which can eventually be used to create value.”
This feels to me that it would really be changing the framework considerably, rather than just a “tweak”.
For example, consider a “speed up” with an endogenous end time. On the original model, this decreases total value (assuming the future is overall good). But if we’re talking about gaining a pot of fixed resources, speeding up progress forever doesn’t change total value.
Like the other commenter says, I feel worried that v(.) refers to the value of “humanity”. For similar reasons, I feel worried that existential risk is defined in terms of humanity’s potential.
One issue is that it’s vague what counts as “humanity”. Homo sapiens count, but what about:
I’m not sure where you draw the line, or if there is a principled place to draw the line.
A second issue is that “humanity” doesn’t include the value of:
And, depending on how “humanity” is defined, it may not include non-aligned AI systems that nonetheless produce morally valuable or disvaluable outcomes.
I tried to think about how to incorporate this into your model, but ultimately I think it’s hard without it becoming quite unintuitive.
And I think these adjustments are potentially non-trivial. I think one could reasonably hold, for example, that the probability of a technologically-capable species evolving, if Homo sapiens goes extinct, is 90%, that non-Earth-originating alien civilisations settling the solar systems that we would ultimately settle is also 90%, and that such civilisations would have similar value to human-originating civilisation.
(They also change how you should think about longterm impact. If alien civilisations will settle the Milky Way (etc) anyway, then preventing human extinction is actually about changing how interstellar resources are used, not whether they are used at all .)
And I think it means we miss out on some potentially important ways of improving the future. For example, consider scenarios where we fail on alignment. There is no “humanity”, but we can still make the future better or worse. A misaligned AI system that promotes suffering (or promotes something that involves a lot of suffering) is a lot worse than an AI system that promotes something valueless.
Like the other commenter says, I feel worried that v(.) refers to the value of “humanity”. For similar reasons, I feel worried that existential risk is defined in terms of humanity’s potential.
One issue is that it’s vague what counts as “humanity”. Homo sapiens count, but what about:
I’m not sure where you draw the line, or if there is a principled place to draw the line.
A second issue is that “humanity” doesn’t include the value of:
And, depending on how “humanity” is defined, it may not include non-aligned AI systems that nonetheless produce morally valuable or disvaluable outcomes.
I tried to think about how to incorporate this into your model, but ultimately I think it’s hard without it becoming quite unintuitive.
And I think these adjustments are potentially non-trivial. I think one could reasonably hold, for example, that the probability of a technologically-capable species evolving, if Homo sapiens goes extinct, is 90%, that non-Earth-originating alien civilisations settling the solar systems that we would ultimately settle is also 90%, and that such civilisations would have similar value to human-originating civilisation.
(They also change how you should think about longterm impact. If alien civilisations will settle the Milky Way (etc) anyway, then preventing human extinction is actually about changing how interstellar resources are used, not whether they are used at all .)
And I think it means we miss out on some potentially important ways of improving the future. For example, consider scenarios where we fail on alignment. There is no “humanity”, but we can still make the future better or worse. A misaligned AI system that promotes suffering (or promotes something that involves a lot of suffering) is a lot worse than an AI system that promotes something valueless.
Like the other commenter says, I feel worried that v(.) refers to the value of “humanity”. For similar reasons, I feel worried that existential risk is defined in terms of humanity’s potential.
One issue is that it’s vague what counts as “humanity”. Homo sapiens count, but what about:
I’m not sure where you draw the line, or if there is a principled place to draw the line.
A second issue is that “humanity” doesn’t include the value of:
And, depending on how “humanity” is defined, it may not include non-aligned AI systems that nonetheless produce morally valuable or disvaluable outcomes.
I tried to think about how to incorporate this into your model, but ultimately I think it’s hard without it becoming quite unintuitive.
And I think these adjustments are potentially non-trivial. I think one could reasonably hold, for example, that the probability of a technologically-capable species evolving, if Homo sapiens goes extinct, is 90%, that non-Earth-originating alien civilisations settling the solar systems that we would ultimately settle is also 90%, and that such civilisations would have similar value to human-originating civilisation.
(They also change how you should think about longterm impact. If alien civilisations will settle the Milky Way (etc) anyway, then preventing human extinction is actually about changing how interstellar resources are used, not whether they are used at all .)
And I think it means we miss out on some potentially important ways of improving the future. For example, consider scenarios where we fail on alignment. There is no “humanity”, but we can still make the future better or worse. A misaligned AI system that promotes suffering (or promotes something that involves a lot of suffering) is a lot worse than an AI system that promotes something valueless.
I felt like the paper gave enhancements short shrift. As you note, they are the intervention most plausibly competes with existential risk reduction, as they scale with .
You say: “As with many of these idealised changes, they face the challenge of why this wouldn’t happen eventually, even without the current effort. I think this is a serious challenge for many proposed enhancements.”
I agree that this is a serious challenge, and that one should have more starting scepticism about the persistence of enhancements compared with extinction risk reduction.
But there is a compelling response as to why the improvements to v(.) don’t happen anyway: which is that future agents don’t want them to happen. Taking a simplified example: In one scenario, society is controlled by hedonists; in another scenario, society is controlled by preference-satisfactionists. But, let us assume, the hedonists do in fact produce more value. I don’t think we should necessarily expect the preference-satisfactionists to switch to being hedonists, if they don’t want to switch.
(Indeed, that’s the explanation of why AI risk is so worrying from a longterm perspective. Future AI agents might want something valueless, and choose not to promote what’s actually of value.)
So it seems to me that your argument only works if one assumes a fairly strong form of moral internalism, that future agents will work out the moral truth and then act on that basis.
“While the idea of a gain is simple — a permanent improvement in instantaneous value of a fixed size — it is not so clear how common they are.”
I agree that gains aren’t where the action is, when it comes to longterm impact. Nonetheless, here are some potential examples:
These plausibly have two sources of longterm value. The first is that future agents might have slightly better lives as a result: perhaps one in a billion future people are willing to pay the equivalent of $1 in order to be able to see a real-life panda, or to learn about the life and times of a historically interesting figure. This scales in future population size, so is probably an “enhancement” rather than a “gain” (though it depends a little on one’s population ethics).
The second is if these things have intrinsic value. If so, then perhaps they provide a fixed amount over value at any time. That really would be a gain.
Another possible gain is preventing future wars that destroy resources. Suppose that, for example, there’s a war between two factions of future interstellar civilisation, and a solar system is destroyed as a result. That would be a loss.
You write: “How plausible are speed-ups? The broad course of human history suggests that speed-ups are possible,” and, “though there is more scholarly debate about whether the industrial revolution would have ever happened had it not started in the way it did. And there are other smaller breakthroughs, such as the phonetic alphabet, that only occurred once and whose main effect may have been to speed up progress. So contingent speed-ups may be possible.”
This was the section of the paper I was most surprised / confused by. You seemed open to speed-ups, but it seems to me that a speed-up for the whole rest of the future is extremely hard to do.
The more natural thought is that, at some point in time, we either hit a plateau, or hit some hard limit of how fast v(.) can grow (perhaps driven by cubic or quadratic growth as future people settle the stars). But if so, then what looks like a “speed up” is really an advancement.
I really don’t see what sort of action could result in a speed-up across the whole course of v(.), unless the future is short (e.g. 1000 years).
I broadly agree with the upshots you draw, but here are three points that make things a little more complicated:
Continued exponential growth
As you note: (i) if v(.) continues exponentially, then advancements can compete with existential risk reduction; (ii) such continued exponential growth seems very unlikely.
However, it seems above 0 probability that we could have continued exponential growth in v(.) forever, including at the end point (and perhaps even at a very fast rate, like doubling every year). And, if so, then the total value of the future would be dramatically greater than if v(.) increases cubically and/or eventually plateaus. So, one might argue: this is where most of the expected value is. So advancements, in expectation, are competitive with existential risk reduction.
Now, I hate this argument: it seems like it's falling prey to “fanaticism” in the technical sense, letting our expected value calculations be driven by extremely small probabilities.
But it at least shows that, when thinking about longterm impact, we need to make some tough judgment calls about which possibilities we should ignore on the grounds of driving “fanatical”-seeming conclusions, even while only considering only finite amounts of value.
Aliens
Elsewhere, you note the loss of galaxies due to the expansion of the universe, which means that ~one five-billionth of the universe per year becomes inaccessible.
But if the “grabby aliens” model is correct, then that number is too low. By my calculation, if we meet grabby alien civilisations in, for example, one billion years (which I think is about the median estimate from the grabby aliens model), then we “lose” approximately 1 millionth of accessible resources to alien civilisations every year. This is still very small, but three orders of magnitude higher than what we get by just looking at the expansion of the universe.
(Then there’s a hard and relevant question about the value of alien civilisation versus the value of human-originating civilisation.)
Length of advancements / delays
“An advancement of an entire year would be very difficult to achieve: it may require something comparable to the entire effort of all currently existing humans working for a year.”
This is true when considering “normal” economic trajectories. But I think there are some things we could do that could cause much greater advancements or delays. A few examples:
Combining this with the “grabby aliens” point, there is potentially 0.1% of the value of the future that could be gained from preventing delays (1000 years * 1 millionth loss per year). Still much lower than the loss of value from anthropogenic existential risks, but higher than from non-anthropogenic risks. It’s enough that I think it’s not really action-relevant, but so at the same time not totally negligible.
Hi Toby,
Thanks so much for doing and sharing this! It’s a beautiful piece of work - characteristically clear and precise.
Remarkably, I didn’t know you’d been writing this, or had an essay coming out that volume! Especially given that I’d been doing some similar work, though with a different emphasis.
I’ve got a number of thoughts, which I’ll break into different comments.
Existential risk, and an alternative framework
One common issue with “existential risk” is that it’s so easy to conflate it with “extinction risk”. It seems that even you end up falling into this use of language. You say: “if there were 20 percentage points of near-term existential risk (so an 80 percent chance of survival)”. But human extinction is not necessary for something to be an existential risk, so 20 percentage points of near-term existential risk doesn’t entail an 80 percent chance of survival. (Human extinction may also not be sufficient for existential catastrophe either, depending on how one defines “humanity”))
Relatedly, “existential risk” blurs together two quite different ways of affecting the future. In your model: V=¯vτ. (That is: The value of humanity's future is the average value of humanity's future over time multiplied by the duration of humanity's future.)
This naturally lends itself to the idea that there are two main ways of improving the future: increasing ¯v and increasing τ.
In What We Owe The Future I refer to the latter as “ensuring civilisational survival” and the former as “effecting a positive trajectory change”. (We’ll need to do a bit of syncing up on terminology.)
I think it’s important to keep these separate, because there are plausible views on which affecting one of these is much more important than affecting the other.
Some views on which increasing ¯v is more important:
Some views on which increasing τ is more important:
What’s more, changes to τ are plausible binary, but changes to ¯v are not. Plausibly, most probability mass is on τ being small (we go extinct in the next thousand years) or very large (we survive for billions of years or more). But, assuming for simplicity that there’s a “best possible” and “worst possible” future, ¯v could take any value between 100% and -100%. So focusing only on “drastic” changes, as the language of “existential risk” does, makes sense for changes to τ, but not for changes to ¯v .