yefreitor's Quick takes

yefreitor

Claim: Credible plans for a "pivotal act" may drive AI race dynamics

(Epistemic status: I had mathematica do all the grunt work and did not check the results carefully)

Consider a simple normal-form game with two equally capable agents A and B, each of which is deciding whether to aggressively pursue AI development, and three free parameters:

the probability that accelerating AI development results in an existential catastrophe (with utility -1 for both agents, versus a utility 0 status quo).
the utility $u_{1}$ of developing the first friendly AI
the utility $- 1 < u_{2} < u_{1}$ of the other agent developing friendly AI

We'll first assume the coin only gets flipped once: developing a friendly AI lets you immediately control all other AI development.

Since our choice of parameterization was in retrospect one that requires a lot of typing, we'll define $u_{+} = \frac{u_{1} + u_{2}}{2}$ , $u_{-} = u_{1} - u_{+} = u_{+} - u_{2}$ and then rescale to get something more readable

$\begin{matrix} Accelerate & Don't Accelerate & u_{+} & u_{+} - u_{-}, u_{+} + u_{-} Don't & u_{+} + u_{-}, u_{+} - u_{-} & \frac{p_{d o o m}}{1 - p_{d o o m}} \end{matrix}$

(Accelerate, Accelerate) is always a Nash equilibrium, no matter how trivial the differences $u_{-}$ captures are.
(Don't, Don't) is a Nash equilibrium when $(u_{+} + u_{-}) (1 - p_{d o o m}) < p_{d o o m}$ , as you would expect
(Don't, Don't) is never a trembling-hand equilibrium, since (Don't) does not weakly dominate (Accelerate) for either player.
When $(u_{+} + u_{-}) (1 - p_{d o o m}) \geq p_{d o o m}$ (Accelerate) weakly dominates (Don't) and (Accelerate, Accelerate) is a trembling-hand equilibrium.

Now consider the case where (Accelerate, Accelerate) instead flips two coins.

$\begin{matrix} Accelerate & Don't Accelerate & u_{+} (1 - p_{d o o m}) - p_{d o o m} & u_{+} - u_{-}, u_{+} + u_{-} Don't & u_{+} + u_{-}, u_{+} - u_{-} & \frac{p_{d o o m}}{1 - p_{d o o m}} \end{matrix}$

This is potentially a much safer situation:

(Accelerate, Accelerate) is only a Nash equilibrium when $p_{d o o m} < \frac{u_{-}}{u_{+} + 1}$
(Don't, Don't) is still a Nash equilibrium when $(u_{+} + u_{-}) (1 - p_{d o o m}) < p_{d o o m} ⟺ p_{d o o m} > \frac{u_{+} + u_{-}}{u_{+} + u_{-} + 1}$
(Don't, Don't) is a trembling-hand equilibrium if it's a Nash equilibrium and (Accelerate, Accelerate) is not.