This is a linkpost for https://www.greaterwrong.com/posts/bkG4qj9BFEkNva3EX/ai-development-incentive-gradients-are-not-uniformly
This is a post I made about a couple of simple models of multi-actor AI development (initial work was partly done with Nuño Sempere).
Here's the intro:
Perhaps you think that your values will be best served if the AGI you (or your team, company or nation) are developing is deployed first. Would you decide that it’s worth cutting a few corners, reducing your safety budget, and pushing ahead to try and get your AI out the door first?
It seems plausible, and worrying, that you might. And if your competitors reason symmetrically, we would get a “safety race to the bottom”.
On the other hand, perhaps you think your values will be better served if your enemy wins than if either of you accidentally produces an unfriendly AI. Would you decide the safety costs to improving your chances aren’t worth it?
I split out the comments into areas of concern over on lesswrong. I think it would be a bit too noisy to duplicate that over here, but do feel free to bring up any issues!