AL

Anton Leicht

8 karmaJoined

Comments
1

Thank you for the post, it's interesting and, I think, neglected work on realistic scenarios to take a look at positive goals. 

I have a relatively easy time imagining what a stable failure mode looks like - if everyone's dead, for instance, it seems like they're likely to stay dead. I'm somewhat less certain about how to model a truly stable success mode. What you describe seems to be in essence one successfully aligned AGI and how we might get there. Do you think that is sufficient to be a stable good state regarding AI safety - i.e. a state in which the collective field of AI safety can take a breath and say 'we did it, let's pack it up'? I ask this because it seems important to not be hasty about defining success modes, a false sense of security seems generally dangerous.

I would imagine you might think that one aligned AGI can prevent less well-aligned AGIs from coming into existence; but that of course might come with a potentially concerningly powerful influence on the world. Or that there is a general baseline interest in not building unaligned AGI, so that once the alignment problem is solved, there's just no reason for unaligned AGI coming into existence? Especially in a very slow, multipolar take-off scenario, an isolated success in aligning one AGI doesn't necessarily seem to translate to a global success story. (Even less so if you're worried about how the unaligned and aligned AI might interact).

Another failure mode for an ostensibly stable good state is of course that you just think the AI is aligned and the actions suggested by its value function only come apart from what we think it should do (doesn't even need to be a particularly treacherous or a particularly big turn). Accordingly, some success modes might be more stable than others - i.e. in how certain we can be that the AI is actually correctly aligned and not just seemingly. 

This is a bit of a random collection of thoughts - the TL;DR question version might be: How stable do you think the success in your success stories is?