Summary: Even from an anti-realist stance on morality, there are various reasons we might expect moral convergence in practice.
[Largely written two years ago; cleaned up for draft amnesty week. The ideas benefited from comments and conversations with many people; errors remain my own.]
Consider:
Convergent morality thesis: for some non-tiny fraction of possible minds, their extrapolated volitions will (approximately) coincide — and also coincide with what we’d end up thinking was good (i.e. we ourselves are in this non-tiny fraction).
(This is something like an empirical analogue of moral realism. The idea is that the thing converged upon can be thought of as ~the good, although you don’t need to make any commitment to realism to follow this line of reasoning.)
The central claim of this post is that convergent morality thesis is quite plausible:
- I think I’m around 75%, although that feels non-robust and I’m interested in arguments that might shift me
- [my original draft from two years ago said “60%”, but that seems too low to me now]
- NB if the convergent morality thesis were true, this would mean the complexity of value thesis was false (because one could give a short pointer to something in the relevant set of minds), although there might be a massive amount of computation required to arrive at an applicable conception of the good
- e.g. perhaps sufficient information would be contained in the idea of “evolved social intelligence” and the skill of moral reflection
- (It’s possible that one of these itself turns out to be complex, but my intuition is that they’re both pretty simple in Kolmogorov terms …)
- e.g. perhaps sufficient information would be contained in the idea of “evolved social intelligence” and the skill of moral reflection
The convergent morality thesis might hold in a straightforward way or a more convoluted way.
The straightforward case
Perhaps it’s just the case that the process of moral reflection tends to cause convergence among minds from a range of starting points, via something like social logic plus shared evolutionary underpinnings.
The main intuition pump in favour of the straightforward case is that some limited version of this seems to apply after we restrict to humans:
- It seems like we can predictably make moral progress by reflecting; i.e. coming to answers + arguments that would be persuasive to our former selves
- I think I’m more likely to update towards the positions of smart people who’ve thought long and hard about a topic than the converse (and this is more true the smarter they are, the more they’re already aware of all the considerations I know, and the more they’ve thought about it)
- If I imagine handing people the keys to the universe, I mostly want to know “will they put serious effort into working out what the right thing is and then doing it” rather than their current moral views
Note that all of these intuitions apply more strongly to moral reasoning than to e.g. aesthetic reasoning (though feel less secure than with e.g. mathematical reasoning, where I better understand the underlying dynamics). And they apply more strongly to people from similar cultures than from very distant cultures (although I mostly still hold the intuitions for people from distant cultures, my evidence base there mostly comes from looking at philosophers in the past, who came from societies with significant differences but also significant overlap with my own).
So it looks like there’s at least a region of mind-space where this kind of dynamic applies. This doesn’t tell us what should happen with alien minds, but I think it’s a pretty big update away from a prior of “maybe what people want is just pretty arbitrary/idiosyncratic”. If the hypothesis is false, it’s either because (1) somewhere on the spectrum in-between us and alien minds the reflective process breaks down so that it loses the property of reflection leading to convergence; or (2) morality is a mix of the derivable and underivable (and humanity has quite idiosyncratic choices for the underivable parts). On my impression (1) is relatively implausible, but (2) is a real possibility (explored further below).
More generally, I suspect that intuitions in favour of moral realism are in many cases also intuitions in favour of the straightforward case for the convergent morality thesis, so there might well be some good discussion of this in the philosophical literature.[1]
Is morality a mix of the derivable and the underivable?
A thought experiment suggests this at least sometimes happens. If we had a society of beings who deep in their bones valued paperclips, and another who deep in their bones valued staples, it does seem somewhat likely they’d both derive e.g. prohibitions against stealing, or utilitarian instincts towards resource-allocation, as these would help the societies to run more effectively towards their eventual goal of producing vast quantities of their preferred stationary.
Is this what’s going on for humans? I’m not sure that it is. It does seem to me that a lot of human morality has developed in service of the goal of making more humans (and hence making society prosper so that it can afford to feed more humans, win conflicts with other societies, etc.). But I don’t think that has stuck with us; if you look at the outputs of our moral reflection I think we’ve gone deeper than things which are just in the service of making more humans.
If I’m right about that (that making more humans ultimately drove a lot of our moral intuitions but is a scaffolding we will eventually relinquish — and in many cases already have), there are three possibilities:
- Each human will have a significantly different extrapolated volition (/axiology); the apparent convergence-from-reflection is a local phenomenon which occurs at some levels of sophistication but will fall away before the end.
- Many humans (e.g. those trying to be moral and reflective) will have convergent extrapolated volitions (/axiologies). But these will be based on various idiosyncrasies of humanity, and we can’t reasonably expect convergence from non-human minds.
- Our extrapolated morality will converge to something more universal, such that we could expect convergence from some alien minds (perhaps mostly just those who similarly had their moral intuitions shaped by evolution in a social setting, or perhaps something broader).
When I hear discussion of the complexity of value, I normally imagine people are supposing something like B). But my current guess is that it is the least likely of these three possibilities — I’m quite unsure how to think about this, but if forced to put numbers on them now I might say 40% A), 15% B), 45% C). I think B) gets penalized relative to the other two because it postulates more complex behaviour — rather than one basic pattern that applies across minds, it says there’s something relatively special about the closeness of human minds to each other relative to their closeness to other minds.
Then I think for practical decision-making purposes we should apply a heavy discount to world A) — in that world, what everyone else would eventually want isn’t all that close to what I would eventually want. Moreover what me-of-tomorrow would eventually want probably isn’t all that close to what me-of-today would eventually want. So it’s much much less likely that the world we end up with even if we save it is close to the ideal one by my lights. Moreover, even though these worlds possibly differ significantly, I don’t feel like from my present position I have that much reason to be opinionated between them; it’s unclear that I’d greatly imperfect worlds according to the extrapolated volition of some future-me, relative to the imperfect worlds according to the extrapolated volition of someone else I think is pretty reasonable.
The convoluted case
Perhaps many minds end up at a shared notion of what they’re aiming for, via acausal trade (getting to some grand bargain), or evidential cooperation in large worlds. I don’t understand the mechanisms here well enough to be confident, but it seems like a pretty realistic possibility.
The implications of convergence via such a mechanism could be a bit different than for straightforward convergence — since it’s not just predictive of where agents might end up, but creates possible mechanisms for actors in our universe to have some (small) influence over the thing that everyone converges to. (Of course it’s also possible to expect significant straightforward convergence, but then also expect convergence from this mechanism.)
- ^
Does moral realism imply the convergent morality thesis? Not strictly, although it’s suggestive. And even if you believe both, presumably there’s some causal mechanism behind convergent morality. Personally, though, I find many intuitions that used to make me sympathetic to realism now make me sympathetic to the convergent morality thesis.
4 is a great point, thanks.
On 1--3, I definitely agree that I may prudentially prefer some possibilities than others. I've been assuming that from a consequentialist moral perspective the distribution of future outcomes still looks like the one I give in this post, but I guess it should actually look quite different. (I think what's going on is that in some sense I don't really believe in world A, so haven't explored the ramifications properly.)
This comment I just made on Will Aldred's Long Reflection Reading List seems relevant for this topic.
Overall, I'd say there's for sure going to be some degree of moral convergence, but it's often overstated, and whether the degree of convergence is strong enough to warrant going for the AI strategies you discuss in your subsequent posts (e.g., here) would IMO depend on a tricky weighting of risks and benefits (including the degree to which alternatives seem promising).
I agree with this endnote.
For my anti-realism sequence, I've actually made the stylistic choice of defining (one version of) moral realism as implying moral convergence (at least under ideal reasoning circumstances). That's notably different from how philosophers typically define it. I went for my idiosnycratic definition because, when I tried to find out what are the action-guiding versions of moral realism (here), many ways in which philosophers have defined "moral realism" in the literature don't actually seem relevant for what we should do as effective altruists. I could only come up with two (very different!) types of moral realism that would have clear implications for effective altruism.
(1) Non-naturalist moral realism based on the (elusive?) concept of irreducible normativity.
(2) Naturalist moral realism where the true morality is what people who are interested in "doing the most moral/altruistic thing" would converge on under ideal reflection conditions.
(See this endnote where I further justify my choice of (2) against some possible objections.)
I think (1) just doesn't work as a concept, and (2) is almost certainly false at least in its strongest form. But yeah, there's going to be degrees of convergence, and moral reflection (even at the individual level without convergence) is relevant also from within a moral anti-realist reasoning framework.
Yes. And there are many cases where evolution has indeed converged on solutions to other problems[1].
Some examples:
(Copy-pasted from Claude 3 Opus. They pass my eyeball fact-check.)