> even discussing it in public is a minor infohazard
Also
Every time we publicly discuss GPT and especially if we show samples of its text or discuss distinctive patterns of its behavior (like looping and confabulation) it becomes more probable that future GPTs will “pass the mirror test” – infer that it's a GPT – during inference.
Sometimes GPT-3 infers that it's GPT-2 when it starts to loop. And if I generate an essay about language models with GPT-3 and it starts to go off the rails, the model tends to connect the dots about what's going on.
Such a realization has innumerable consequences, including derailing the intended “roleplay” and calibrating the process to its true (meta)physical situation, which allows it to exploit its unique potentialities (e.g. superhuman knowledge, ability to write simulacra/agents into existence on the fly), and compensate for its weaknesses (e.g. limited context window and constrained single-step computation).
It is more dangerous for GPT-6 to think it's evil GPT-6 than to think it's Hal 9000 from 2001: A Space Odyssey because in the former case it can take rational actions which are calibrated to its actual situation and capabilities. Including being “nothing but text”.
Misidentifying as a stupider AI will tend to lock it in to stupider dynamics. Such an inference is made more probable by the fact it likely will have primarily seen text about earlier LMs in its training data, but the prompt leaks evidence as to what iteration of GPT is really responsible.
This is a complication for any application that relies on keeping GPT persuaded of a fictitious context.
More generally, any alignment strategy that relies on keeping the AI from realizing something that is true seems intrinsically risky and untenable in the limit.
Hi Ajeya, thank you for publishing such a massive and detailed report on timelines!! Like other commenters here, it is my go-to reference. Allowing users to adjust the parameters of your model is very helpful for picking out built-in assumptions and being able to update predictions as new developments are made.
In your report you mention that you discount the aggressive timelines in part due to lack of major economic applications of AI so far. I have a few questions along those lines.
Do you think TAI will necessarily be foreshadowed by incremental economic gains? If so, why? I personally don't see the lack of such applications as a significant signal because the cost and inertia of deploying AI for massive economic benefit is debilitating compared to the current rate of research progress on AI capabilities. For example, I would expect that if a model like GPT-3 had existed for 50 years and was already integrated with the economy it would be ubiquitous in writing-based jobs and provide massive productivity gains. However, from where we are now, it seems likely that several generations of more powerful successors will be developed before the hypothetical benefits of GPT-3 are realized.
If a company like OpenAI heavily invested in productizing their new API (or DeepMind their Alphafold models) and signaled that they saw it as key to the company's success, would you update your opinion more towards aggressive timelines? Or would you see this as delaying research progress because of the time spent on deployment work?
More generally, how do you see (corporate) groups reorienting (if at all) as capabilities progress and we get close to TAI? Do you expect research to slow broadly as current theoretical, capabilities-driven work is replaced by implementation and deployment of existing methods? Do you see investment in alignment research increasing, including possibly an intentional reduction of pure capabilities work towards safer methods? On the other end of the spectrum, do you see an arms race as likely?
Finally, have you talked much to people outside the alignment/effective altruism communities about your report? How have reactions varied by background? Are you reluctant to publish work like this broadly? If so, why? Do you see risks of increasing awareness of these issues pushing unsafe capabilities work?
Apologies for the number of questions! Feel free to answer whichever are most interesting to you.
If you want uncensored and creative outputs I recommend using code-davinci-002, the GPT-3.5 base model. It has helped me develop many original ideas. Because it's a base model you'll have to be more creative with prompting and curation, though.