Jim Buhler

Research fellow @ Existential Risk Alliance
395 karmaJoined London, UK

Bio

Participation
4

I'm researching how to predict what values might control the future (if any), ways to estimate the expected value/sign of human expansion, and cooperation and conflict in non-causal contexts (particularly between AI systems).

Interests of mine include GPR, AGI governance, malevolent actors, as well as CLR's research agenda on cooperative AI and S-risks.

I used to be EA France's community director and am still doing some event management there.

I've also recently completed a Master's degree in ethics.

I've also written some stuff on LessWrong.

You can give me anonymous feedback here. :)

Sequences
1

What values will control the Future?

Comments
41

Topic contributions
4

Thanks!

Are you thinking about this primarily in terms of actions that autonomous advanced AI systems will take for the sake of optimisation?

Hum... not sure. I feel like my claims are very weak and true even in future worlds without autonomous advanced AIs.


"One large driver of humanity's moral circle expansion/moral improvement has been technological progress which has reduced resource competition and allowed groups to expand concern for others' suffering without undermining themselves".

Agreed but this is more similar to argument (A) fleshed out in this footnote, which is not the one I'm assailing in this post.

Thanks Vasco! Perhaps a nitpick but suffering still doesn't seem to be the limiting factor per se, here. If farmed animals were philosophical zombies (i.e., were not sentient but still had the exact same needs), that wouldn't change the fact that one needs to keep them in conditions that are ok enough to be able to make a profit out of them. The limiting factor is their physical needs, not their suffering itself. Do you agree?

I think the distinction is important because it suggests that suffering itself appears as a limiting factor only insofar as it is strong evidence of physical needs that are not met. And while both strongly correlate in the present, I argue that we should expect this to change.

Interesting, thanks Ben! I definitely agree that this is the crux. 

I'm sympathetic to the claim that "this algorithm would be less efficient than quicksort" and that this claim is generalizable.[1] However, if true, I think it only implies that suffering is -- by default -- inefficient as a motivation for an algorithm

Right after making my crux claim, I reference some of Tobias Baumann's (2022a, 2022b) work which gives some examples of how significant amounts of suffering may be instrumentally useful/required in cases such as scientific experiments where sentience plays a key role (where the suffering is not due to it being a strong motivator for an efficient algorithm, but for other reasons). Interestingly, his "incidental suffering" examples are more similar to the factory farming and human slavery examples than to the Quicksort example.

  1. ^

    To be fair, it's been a while since I've read about stuff like suffering subroutines (see, e.g., Tomasik 2019) and its plausibility, and people might have raised considerations going against that claim.

Thanks, Maxime! This is indeed a relevant consideration I thought a tiny bit about, and Michael St. Jules also brought that up in a comment on my draft.

First of all, it is important to note that UCC affects the neglectedness -- and potentially also the probability -- of "late s-risks", only (i.e., those that happen far away enough from now for the UCC selection to actually have the time to occur). So let's consider only these late s-risks.

We might want to differentiate between three different cases:
1. Extreme UCC (where suffering is not just ignored but ends up being valued as in the scenario I depict in this footnote. In this case, all kinds of late s-risks seem not only more neglected but also more likely.
2. Strong UCC (where agents end up being roughly indifferent to suffering; this is the case your comment assumes I think). In this case, while all kinds of late s-risks seem more neglected, late s-risks from conflict seem indeed less likely. However, this doesn't seem to apply (at least) near-misses and incidental risks.
3. Weak UCC (where agents still care about suffering but much less than we do). In this case, same as above, except perhaps for the "late s-risks from conflict" part. I don't know how weak UCC would change conflict dynamics.

The more we expect #2 more than #1 and #3, the more your point applies, I think (with the above caveat on near-misses and incidental risks). I might definitely have missed something, though. It's a bit complicated.

Thanks for the comment!

Right now, in rich countries, we seem to live in an unusual period Robin Hanson (2009) calls "the Dream Time". You can survive valuing pretty much whatever you want, which is why there isn't much selection pressure on values. This likely won't go on forever, especially if Humanity starts colonizing space.

(Re religion. This is anecdotical but since you brought up this example: in the past, I think religious people would have been much less successful at spreading their values if they were more concerned about the suffering of the people they were trying to convert. The growth of religion was far from being a harm-free process.)

Thanks Will! :)

I think I haven't really thought about this possibility.

I know nothing about how things like false vacuum decay work (thankfully, I guess), about how tractable it is, and about how the minds of the agents would work on trying to trigger that operate. And my immediate impression is that these things matter a lot to whether my responses to the first two "obvious objections" sort of apply here as well and to whether "decay-conducive values" might be competitive.

However, I think we can at least confidently say that -- at least in the intra-civ selection context (see my previous post) -- a potential selection effect non-trivially favoring "decay-conducive values", during the space colonization process, seems much less straightforward and obvious than the selection effect progressively favoring agents that are more and more upside-focused (on long-time scales with many bits of selection). The selection steps are not the same in these two different cases and the potential dynamic that might lead decay-conducive values to take over seems more complex and fragile.

Thanks for giving arguments pointing the other way! I'm not sure #1 is relevant to our context here, but #2 is definitely worth considering. In the second post of the present sequence, I argue that something like #2 probably doesn't pan out, and we discuss an interesting counter-argument in this comment thread.

Thanks Miranda! :) 

I personally think the strongest argument for reducing malevolence is its relevance for s-risks (see section Robustness: Highly beneficial even if we fail at alignment), since I believe s-risks are much more neglected than they should be.

And the strongest counter-considerations for me would be  

  • Uncertainty regarding the value of the future. I'm generally much more excited about making the future go better rather than "bigger" (reducing X-risk does the latter), so the more reducing malevolence does the latter more than the former, the less certain I am it should be a priority. (Again, this applies to any kind of work that reduces X-risks, though.)
  • Info / attention hazards. Perhaps the best way to avoid these malevolence scenarios is to ignore them and avoid making them more salient. 

Interesting question you asked, thanks! I added a link to this comment in a footnote. 

Right so assuming no early value lock-in and the values of the AGI being (at least somewhat) controlled/influenced by its creators, I imagine these creators to have values that are grabby to varying extents, and these values are competing against one another in the big tournament that is cultural evolution.

For simplicity, say there are only two types of creators: the pure grabbers (who value grabbing (quasi-)intrinsically) and the safe grabbers (who are in favor of grabbing only if it is done in a "safe" way, whatever that means).

Since we're assuming there hasn't been any early value lock-in, the AGI isn't committed to some form of compromise between the values of the pure and safe grabbers. Therefore, you can imagine that the AGI allows for competition and helps both groups accomplish what they want proportionally to their size, or something like that. From there, I see two plausible scenarios:
A) The pure and safe grabbers are two cleanly separated groups running a space expansion race against one another, and we should -- all else equal -- expect the pure grabbers to win, for the same reasons why we should -- all else equal -- expect the AGI race to be won by the labs optimizing for AI capabilities rather than for AI safety.
B) The safe grabbers "infiltrate" the pure grabbers in an attempt to make their space-expansion efforts "safer", but are progressively selected against since they drag the pure-grabby project down. The few safe grabbers who might manage not to value drift and not to get kicked out of the pure grabbers are those who are complacent and not pushing really hard for more safety.

The reason why the intra-civ grabby values selection is currently fairly weak on Earth, as you point out, is that humans didn't even start colonizing space, which makes something like A or B very unlikely to have happened yet. Arguably, the process that may eventually lead to something like A or B hasn't even begun for real. We're unlikely to notice a selection for grabby values before people actually start running something like a space expansion race. And most of those we might expect to want to somehow get involved in the potential[1] space expansion race are currently focused on the race to AGI, which makes sense. It seems like this latter race is more relevant/pressing, right now.

  1. ^

    It seems like this race will happen (or actually be worth running) if, and only if, AGI has non-locked-in values and is corrigible(-ish) and aligned(-ish) with its creators, as we suggested.

Load more