Derek Shiller

Philosophy Researcher @ Rethink Priorities
1331 karmaJoined Derekshiller.com


For an intervention to be a longtermist priority, there needs to be some kind of concrete story for how it improves the long-term future.

I disagree with this. With existential risk from unaligned AI, I don't think anyone has ever told a very clear story about how AI will actually get misaligned, get loose, and kill everyone. People have speculated about components of the story, but generally not in a super concrete way, and it isn't clear how standard AI safety research would address a very specific disaster scenario. I don't think this is a problem: we shouldn't expect to know all the details of how things go wrong in advance, and it is worthwhile to do a lot of preparatory research that might be helpful so that we're not fumbling through basic things during a critical period. I think the same applies to digital minds.

Your points here do not engage with the argument, made by @Zach Stein-Perlman early on in the week, that we can just punt solving AI welfare to the future (i.e., to the long reflection / to once we have aligned superintelligent advisors), and in the meantime continue focusing our resources on AI safety (i.e., on raising the probability that we make it to a long reflection).

I think this viewpoint is overly optimistic about the probability of locking in / the relevance of superintelligent advisors. I discuss some of the issues of locking in in a contribution to the debate week. In brief, I think that it is possible that digital minds will be sufficiently integrated in the next few decades that they will have power in social relationships that will be extremely difficult to disentangle. I also think that AGI may be useful in drawing inferences from our assumptions, but won't be particularly helpful at setting the right assumptions.

I generally agree that the formal thesis for the debate week set a high bar that is difficult to defend and I think that this is a good statement of the case for that. Even if you think that AI welfare is important (which I do!), the field doesn't have the existing talent pipelines or clear strategy to absorb $50 million in new funding each year. Putting that much in over the next few years could easily make things worse. It is also possible that AI welfare has the potential for non-EA money and it should aim for that rather than try to take money that would otherwise go to EA cause areas.

That said, there are other points that I disagree with:

It is not good enough to simply say that an issue might have a large scale impact and therefore think it should be an EA priority, it is not good enough to simply defer to Carl Shulman's views if you yourself can't argue why you think it's "pretty likely... that there will be vast numbers of AIs that are smarter than us" and why those AIs deserve moral consideration.

I think that this is wrong. The fact that something might have a huge scale and we might be able to do something about it is enough for it to be taken seriously and provides prima facie evidence that it should be a priority. I think it is vastly preferrable to preempt problems before they occur rather than try to fix them once they have. For one, AI welfare is a very complicated topic that will take years or decades to sort out. AI persons (or things that look like AI persons) could easily be here in the next decade. If we don't start thinking about it soon, then we may be years behind when it happens.

AI people (of some form or other) are not exactly a purely hypothetical technology, and the epistemic case for them doesn't seem fundamentally different from the case for thinking that AI safety will be an existential issue in the future, that the average intensively farmed animal leads a net-negative life, or that any given global health intervention won't have significant unanticipated negative side effects. We're dealing with deep uncertainties no matter what we do.

Additionally, it might be much harder to try to lobby for changes once things have gone wrong. I wish some groups were actively lobbying against intensified animal agriculture in the 1930s (or the 1880s). It may not have been tractable. It may not have been clear, but it may have been possible to outlaw some terrible practices before they were adopted. We might have that opportunity now with AI welfare. Perhaps this means that we only need a small core group, but I do think some people should make it a priority.

I stick by my intuition, but it is really just an intuition about how human behavior. Perhaps some people would be completely unbothered in that situation. Perhaps most would. (I actually find that itself worrisome in a different way, because it suggests that people may easily overlook AI wellbeing. Perhaps you have the right reasons for happily ignoring their anguished cries, but not everyone will.) This is an empirical question, really, and I don’t think we’ll know how people will react until it happens.

How could they not be conscious?

It is rare for theories of consciousness to make any demands on motivational structure.

  • Global workspace theory, for instance, says that consciousness depends on having a central repository by which different cognitive modules talk to each other. If the modules were to directly communicate point to point, there would be no conscious experiences (by that theory). I see no reason in that case why decision making would have to rely on different mechanisms.
  • Higher order theories suggest that consciousness depends on having representations of our own mental states. A creature could have all sorts of direct concerns that it never reflected on, and these could look a lot like ours.
  • IIT suggests that you could have a high level duplicate of a conscious system that was unconscious due to the fine grained details.
  • Etc. 

The specific things you need to change in the robots to render them not conscious depends on your theory, but I don’t think you need to go quite so far as to make them a lookup table or an transformer.

My impression was that you like theories that stress the mechanisms behind our judgments of the weirdness of consciousness as critical to conscious experiences. I could imagine a robot just like us but totally non-introspective, lacking phenomenal concepts, etc. Would you think such a thing was conscious? Could it not desire things in something like the way we do?

There's another question about whether I'd actually dissect one, and maybe I still wouldn't, but this could be for indirect or emotional reasons. It could still be very unpleasant or even traumatic for me to dissect something that cries out and against the desperate pleas of its mother. Or, it could be bad to become less sensitive to such responses, when such responses often are good indicators of risk of morally significant harm. People who were confident nonhuman animals don't matter in themselves sometimes condemned animal cruelty for similar reasons.

This supports my main argument. If you value conscious experience these emotional reasons could be concerning for the long term future. It seems like a slippery slope from being nice to them because we find it more pleasant to thinking that they are moral patients, particularly if we frequently interact with them. It is possible that our generation will never stop caring about consciousness, but if we’re not careful, our children might.

This case is interesting, but I think it touches on a slightly different issue. The symbolic presumably doesn’t care about their pretend pain. There is a more complicated story about their actions that involves their commitment to the ruse. In the robot case, I assume we’re supposed to imagine that the robots care about each other to whatever extent that unconscious things can. Their motivational structure is close to ours.

I think the case is less clear if we build up the extent to which the asymbolic child really wants the painkillers. If they constantly worry about not getting them, if they are willing to sacrifice lots of other things they care about to secure them (even though they know that it won’t help them avoid pain), etc. I’m less inclined to think the case is clear cut.

I agree! I’m used to armchair reflection, but this is really an empirical question. So much of the other discussion this week has focused on sentience. It would be good to get a sense if this wasn’t the crux for the public.

Thanks Richard!

If there's nothing "all that important" about the identified pattern, whyever would we have identified it as the correct theory of consciousness to begin with?

This particular argument really speaks to the more radical physicalists. I don’t think you should be that moved by it. If I were in your shoes (rather than undecided), I think I’d be more worried that people would come to jettison their concern for consciousness for bad reasons.

One reason to reject this inference is if we accept the phenomenal intentionality thesis that consciousness is necessary for having genuinely representational states (including desires and preferences). I agree that consciousness need not be what's represented as our goal-state; but it may still be a necessary background condition for us to have real goals at all (in contrast to the pseudo-intentionality of mere thermostats and the like).

One case I had in mind while writing this was the matter of unconscious desires in a conscious person. Suppose that we have some desires that shape our motivations but which we never think about. Maybe we have a desire to be near the ocean. We don’t feel any longing, we just find ourselves quickly accepting invitations to the beach. (We also aren’t glad to receive such invitations or any happier when at the beach.) Satisfying that desire seems to me not to count for much in a large part because it has no impact on our conscious states. Would you agree? If so, would you think the intentionality thesis can make sense of this difference? Do you want to withhold intentionality from purely unconscious states in a conscious mind? Or is there a different story you would tell?

I think there is a difference between what people would say about the case and what they would do if actually in it. The question of what people would say is interesting -- I'm curious how your polling goes. But it is easier to support an intellectual stance when you're not confronted by the ramifications of your choice. (Of course, I can also see it going the other way, if we think the ramifications of your choice would harm you instead of the robot.)

I think it is valuable to have this stuff on record. If it isn't recorded anywhere, then anyone who wants to reference this position in another academic work -- even if it is the consensus within a field -- is left presenting it in a way that makes it look like their personal opinion.

Thanks for recording these thoughts!

Here are a few responses to the criticisms.

I think RP underrates the extent to which their default values will end up being the defaults for model users (particularly some of the users they most want to influence)

This is a fair criticism: we started this project with the plan of providing somewhat authoritative numbers but discovered this to be more difficult than we initially expected and instead opted to express significant skepticism about the default choices. Where there was controversy (for instance, in how many years forward we should look), we opted for middle-of-the-road choices. I agree that it would add a lot of value to get reasonable and well-thought-out defaults. Maybe the best way to approach controversy would be to opt for different sets of parameter defaults that users could toggle between based on what different people in the community think.

I found it difficult to provide very large numbers on future population per star - I think with current rates of economic and compute growth, the number of digital people could be extremely high very quickly.

The ability to try to represent digital people with populations per star was a last-minute choice. We originally just aimed for that parameter to represent human populations. (It isn’t even completely obvious to me that stars are the limiting factor on the number of digital people.) However, I also think these things don’t matter since the main aim of the project isn’t really affected by exactly how valuable x-risk projects are in expectation. If you think there may be large populations, the model is going to imply incredibly high rates of return on extinction risk work. Whether those are the obvious choice or not depends not on exactly how high the return, but on how you feel about the risk, and the risks won't change with massively higher populations.

I think some x-risk interventions could plausibly have very long run effects on x-risk (e.g. by building an aligned super intelligence)

If you think we’ll likely have an aligned super-intelligence within 100 years, then you might try to model this by setting risks very low after the next century and treating your project as just a small boost on its eventual discovery. However, you might not think that either superaligned AI or extinction is inevitable. One thing we don’t try to do is model trajectory changes, and those seem potentially hugely significant, but also rather difficult to model with any degree of confidence.

The x-risk model seems to confuse existential risk and extinction risk (medium confidence - maybe this was explained somewhere, and I missed it)

We distinguish extinction risk from risks of sub-extinction catastrophes, but we don’t model any kind of as-bad-as-extinction risks.

Load more