Stephen Casper, scasper@mit.edu. Thanks to Alex Lintz and Daniel Dewey for feedback.
This is a reply but not an objection to a recent post from Paul Christiano titled AI alignment is distinct from its near term applications. The post is fairly brief and the key point is decently summed up by this excerpt.
I worry that companies using alignment to help train extremely conservative and inoffensive systems could lead to backlash against the idea of AI alignment itself. If such systems are held up as key successes of alignment, then people who are frustrated with them may end up associating the whole problem of alignment with “making AI systems inoffensive.”
I have no disagreements with this claim. But I would push back against the general notion that AI [existential] safety work is disjoint from near term applications. Paul seems to agree with this.
We can develop and apply alignment techniques to these existing systems. This can help motivate and ground empirical research on alignment, which may end up helping avoid higher-stakes failures like an AI takeover.
This post argues for strongly emphasizing this point.
What do I mean by near-term applications? Any challenging problem involving consequential AI and society. Examples include:
- Self-driving cars
- Recommender systems
- Search engines
- AI weapons
- Cybersecurity
- Unemployment
- Bias/fairness/justice involving systems that work with humans or human data
- Misuses of media generators including text and image generators
I argue that working on these problems probably matters a lot for three reasons. The second and third of which are potential matters of existential safety.
Non X-risks from AI are still intrinsically important AI safety issues.
There are many important non X-risks in this world, and any altruistic-minded person should care about them. For the same reason we should care about health, wealth, development, and animal welfare, we should also care about making important near-term applications of narrow AI go well for people.
There are valuable lessons to learn from near-term applications.
Imagine that we figured out ways to make near-term applications of AI go very well. I find it incredibly hard to imagine a world in which we did any of these things without developing a lot of useful technical tools and governance strategies that could be retooled or built on for higher-stakes problems later. Consider some examples.
- For self-driving cars to go well, we would need to effectively develop and iterate on techniques for robustness, reliability, and anomaly detection/handling.
- To make recommender systems go well, we would need to make a lot more progress on efficiently inferring what humans truly want in a way that is disentangled from what they seem to want.
- To minimize harms from AI weapons, we would need to introduce a lot of national and international laws and precedent for disincentivizing and responding to deadly AI.
- To minimize problems from discriminatory AI or the misuses of media generators (e.g. BLOOM or Stable Diffusion), we would need laws and precedent establishing definitions of harm and methods for recourse. Perhaps most importantly, establishing lots of laws and bureaucracy around AI systems like this may help to establish a legal regime that provides filters and obstacles to the deployment of risky systems. If auditing and slow timelines are good, so is this kind of bureaucracy.
See also this post.
Making allies and growing the AI safety field is useful
AI safety and longtermism (AIS&L) have a lot of critics, and in the past year or so, they seem to have grown in number and profile. Many of whom are people who work on and care a lot about near-term applications of AI. To some extent this is inevitable. Having an influential and disruptive agenda will inevitably lead to some pushback from competing ones. Haters are going to hate. Trolls are going to troll.
But AIS&L probably have more detractors than they should from people who should really be allies. Given how many forces in the world are making AI more risky, there shouldn’t be conflict between groups of people who are working on making it go better but in different ways. In what world could isolation and mutual dismissal between AIS&L people and people working on neartermist problems be helpful? There seem to be too many common adversaries and interests between the two groups to not be allies–especially for influencing AI governance. Having more friends and fewer detractors seems like it could only increase the political and research capital of the AIS&L community. There is also virtually no downside of being more popular.
I think that some negative press about AIS&L might be due to active or tacit dismissal of the importance of neartermist work by AIS&L people. Speaking for myself, I have had a number of conversations in the past few months with non AIS&L who seem sympathetic but have expressed feelings of dismissal by the community which has made them more hesistant to be involved. For this reason, we might stand to benefit a great deal from less parochialism and more friends.
Paul argues that
...companies using alignment to help train extremely conservative and inoffensive systems could lead to backlash against the idea of AI alignment itself.
But I think it is empirically, overwhelmingly clear that a much bigger concern when it comes to "backlash against the idea of AI alignment itself" comes from failures of the AIS&L community to engage with more neartermist work.
Thanks for reading--constructive feedback is welcome.
If the people you ultimately want to influence are the technophiles who are building AI, who regard most near-term 'AI safety' people as annoying scolds and culture warriors, it could be good to clearly differentiate yourself from them. If existential safety people get a reputation as reliable collaborators, employees and allies who don't support the bad behaviour of many AI bias people this could put us in a good position.
I think I disagree with the general direction of this comment but it’s hard to state why, so I’ll just outline an alternative view:
Yes, I agree with this. I think in general there is a fair bit of social pressure to give credence to intellectually weak concerns about 'AI bias' etc., which is part of what technophiles dislike, even if they can't say so publicly. Pace your first sentence, I think that self-censorship is helpful for building reputation in some fields. As such, I expect honestly reporting an epistemically rigourous evaluation of these arguments will often suffice to cause 'isolation and mutual dismissal' from Gebru-types, even while it is positive for your reputation among 'builder' capabilities researchers.
Note that in general existential safety people have put a fair bit of effort into trying to cultivate good relations with near-term AI safety people. The lowest hanging fruit implied by the argument above is to simply pull back on these activities.
Sure but I think they are less intrinically important for the standard ITN reasons.
I think that your statement implies that we should care about them a similar amount to longtermist motivated safety which might be true but you don't make a case for why we should care. I don't think the reasons for prioritising LT AIS are strongly correlated with the reasons for prioritising NT AIS so it would be somewhat surprising if this were true.
As someone who is a deep learning researcher and came to believe in the importance of AI safety through EA, I would like to say I strongly agree with the last point on making allies and growing the AI safety field. I support the claim that some people feel more hesitant to be involved in AI safety or just give up as there is a somewhat cliquey and dismissive feeling from the community and the community sometimes feels quite fragmented on arguments for and against what's useful. To me, this feels a bit counterproductive and alienating.
I hypothesize that frowning on, or even just the large focus on questioning the usefulness of near-term safety work adds to the deterrence of other current deep learning researchers and maybe other communities too engaging with AI safety. Less parochialism and more friends seem like a sensible approach and a more productive community.
One thing I think is interesting is how similar some of the work is from bay area AI safety folks and other safety crowds, like the area often referred to as "AI ethics." For example, Redwood worked on a paper about safe language generation, focusing on descriptions of physical harm, and safe language generation is a long-running academic research area (including for physical harm! see https://arxiv.org/pdf/2210.10045.pdf). The deepest motivating factors behind the work may differ, but this is one reason I think there is a lot of common ground across safety research areas.
+1 I think it's very worthwhile to emphasize neartermist reasons to care about work that may be primarily longtermism-oriented.
Thanks for exploring this issue! I agree that there could be more understanding between AI safety & the wider AI community, and I'm curious to do more thinking about this.
I think each of the 3 claims you make in the body of the text are broadly true. However I don't think they directly back up the claim in the title that "AI safety is not separate from near-term applications".
I think there are some important ways that AI safety is distinct; it goes 1 step further by imagining the capabilities of future systems, and trying to anticipate ways they could go wrong ahead of time. I think there are some research questions it'd be hard to work on if the AI safety field wasn't separate from current-day application research. E.g. agent foundations, inner misalignment and detecting deception.
I think I agree with much of your sentiment still. To illustrate what I mean, I would like it to be true that:
No disagreements here. I guess I imagine AIS&L work along with work on the neartermist examples I mentioned as a venn diagram with healthy overlap. I'm glad for the AIS&L community, and I think it tackles some truly unique problems. By "separate" I essentially meant "disjoint" in the title.