[[THIRD EDIT: Thanks so much for all of the questions and comments! There are still a few more I'd like to respond to, so I may circle back to them a bit later, but, due to time constraints, I'm otherwise finished up for now. Any further comments or replies to anything I've written are also still be appreciated!]]
Hi!
I'm Ben Garfinkel, a researcher at the Future of Humanity Institute. I've worked on a mixture of topics in AI governance and in the somewhat nebulous area FHI calls "macrostrategy", including: the long-termist case for prioritizing work on AI, plausible near-term security issues associated with AI, surveillance and privacy issues, the balance between offense and defense, and the obvious impossibility of building machines that are larger than humans.
80,000 Hours recently released a long interview I recorded with Howie Lempel, about a year ago, where we walked through various long-termist arguments for prioritizing work on AI safety and AI governance relative to other cause areas. The longest and probably most interesting stretch explains why I no longer find the central argument in Superintelligence, and in related writing, very compelling. At the same time, I do continue to regard AI safety and AI governance as high-priority research areas.
(These two slide decks, which were linked in the show notes, give more condensed versions of my views: "Potential Existential Risks from Artificial Intelligence" and "Unpacking Classic Arguments for AI Risk." This piece of draft writing instead gives a less condensed version of my views on classic "fast takeoff" arguments.)
Although I'm most interested in questions related to AI risk and cause prioritization, feel free to ask me anything. I'm likely to eventually answer most questions that people post this week, on an as-yet-unspecified schedule. You should also feel free just to use this post as a place to talk about the podcast episode: there was a thread a few days ago suggesting this might be useful.
Have you considered doing a joint standup comedy show with Nick Bostrom?
I want to push back against this, from one of your slides:
I feel like the LW community did notice many important issues with the classic arguments. Personally, I was/am pessimistic about AI risk, but thought my reasons were not fully or most captured by the those arguments, and I saw various issues/caveats with them that I talked about on LW. I'm going to just cite my own posts/comments because they're the easiest to find, but I'm sure there were lots of criticisms from others too. 1 2 3 4
Of course I'm glad that you thought about and critiqued those arguments in a more systematic and prominent way, but it seems wrong to say or imply that nobody noticed their issues until now.
Do you think there were any deficits in epistemic modesty in the way the EA community prioritised AI risk, or do you think it was more that no-one sat down and examined the object-level arguments properly? Alternatively, do you think that there was too much epistemic modesty in the sense that everyone just deferred to everyone else on AI risk?
I feel that something went wrong, epistemically, but I'm not entirely sure what it was.
My memory is that, a few years ago, there was a strong feeling within the longtermist portion of the EA community that reducing AI risk was far-and-away the most urgent problem. I remember there being a feeling that the risk was very high, that short timelines were more likely than not, and that the emergence of AGI would likely be a sudden event. I remember it being an open question, for example, whether it made sense to encourage people to get ML PhDs, since, by the time they graduated, it might be too late. There was also, in my memory, a sense that all existing criticisms of the classic AI risk arguments were weak. It seemed plausible that the longtermist EA community would pretty much just become an AI-focused community. Strangely, I'm a bit fuzzy on what my own views were, but I think they were at most only a bit out-of-step.
This might be an exaggerated memory. The community is also, obviously, large enough for my experience to be significantly non-representative. (I'd be interested in whether the above description resonates with anyone else.) But, in any case, I am pretty confident that th
... (read more)FWIW, it mostly doesn't resonate with me. (Of course, my experience is no more representative than yours.) Just as you I'd be curious to hear from more people.
I think what matches my impression most is that:
On the other points, my impression is that if there were consistent and significant changes in views they must have happened mostly among people I rarely interact with personally, or more than 3 years ago.
- One shift in views that has had major real-world consequences is Holden Karnofsky, and by extension Open Phil, taking AI risk more seriously. He posted about this in September 2016, so presumably he changed his mind over the months prior to that.
- I started to engage more deeply with public discussio
... (read more)I think that instead of talking about potential failures in the way the EA community prioritized AI risk, it might be better to talk about something more concrete, e.g.
I think if we think there were mistakes in the concrete actions people have taken, e.g. mistaken funding decisions or mistaken career changes (I’m not sure that there were), we should look at the process that led to those decisions, and address that process directly.
Targeting ‘the views of the average EA’ seems pretty hard. I do think it might be important, because it has downstream effects on things like recruitment, external perception, funding, etc. But then I think we need to have a story for how we affect the views of the average EA (as Ben mentions). My guess is that we don’t have a story like that, and that’s a big part of ‘what went wrong’-- the movement is growing in a chaotic way that no individual is responsible for, and that can lead to collectively bad epistemics.
‘Encouraging EAs to ... (read more)
Which of the EA-related views you hold are least popular within the EA community?
I'm not sure how unpopular these actually are, but a few at least semi-uncommon views would be:
I'm pretty sympathetic to non-naturalism, in the context of both normativity and consciousness
Controlling for tractability, I think it's probably more important to improve the future (conditional on humanity not going extinct) than to avoid human extinction. (The gap between a mediocre future or bad future and the best possible future is probably vast.)
I don't actually know what my credence is here, since I haven't thought much about the issue, but I'm probably more concerned about growth slowing down and technological progress stagnating than the typical person in the community
What are the key issues or causes that longtermists should invest in, in your view? And how much should we invest in them, relatively speaking? What issues are we currently under-investing in?
Have you had any responses from Bostrom or Yudkowsky to your critiques?
Would you rather be one or two dogs?
I'm sorry, but I consider that a very personal question.
Hi Ben. I just read the transcript of your 80,000 Hours interview and am curious how you'd respond to the following:
Analogy to agriculture, industry
You say that it would be hard for a single person (or group?) acting far before the agricultural revolution or industrial revolution to impact how those things turned out, so we should be skeptical that we can have much effect now on how an AI revolution turns out.
Do you agree that the goodness of this analogy is roughly proportional to how slow our AI takeoff is? For instance if the first AGI ever created becomes more powerful than the rest of the world, then it seems that anyone who influenced the properties of this AGI would have a huge impact on the future.
Brain-in-a-box
You argue that if we transition more smoothly from super powerful narrow AIs that slowly expand in generality to AGI, we'll be less caught off guard / better prepared.
It seems that even in a relatively slow takeoff, you wouldn't need that big of a discontinuity to result in a singleton AI scenario. If the first AGI that's significantly more generally intelligent than a human is created in a world where lots of powerful narrow AIs exist, wouldn&apo... (read more)
What would you recommend as the best introduction to concerns (or lack thereof) about risks from AI?
If you have time and multiple recommendations, I would be interested in a taxonomy. (E.g. this is the best blog post for non-technical readers, this is the best book-length introduction for CS undergrads.)
I agree with Aidan's suggestion that Human Compatible is probably the best introduction to risks from AI (for both non-technical readers and readers with CS backgrounds). It's generally accessible and engagingly written, it's up-to-date, and it covers a number of different risks. Relative to many other accounts, I think it also has the virtue of focusing less on any particular development scenario and expressing greater optimism about the feasibility of alignment. If someone's too pressed for time to read Human Comptabile, the AI risk chapter in The Precipice would then be my next best bet. Another very readable option, mainly for non-CS people, would be the AI risk chapters in The AI Does Not Hate You: I think they may actually be the cleanest distillation of the "classic" AI risk argument.
For people with CS backgrounds, hoping for a more technical understanding of the problems safety/alignment researchers are trying to solve, I think that Concrete Problems in AI Safety, Scalable Agent Alignment Via Reward Modeling, and Rohin Shah's blog post sequence on "value learning" are especially good picks. Although none of these resources frames safety/alignment research as something that'
... (read more)This seems like a promising topic for an EA Forum question. I would consider creating one and reposting your comment as an answer to it. A separate question is probably also a better place to collect answers than this thread, which is best reserved for questions addressed to Ben and for Ben's answers to those questions.
What do you think is the probability of AI causing an existential catastrophe in the next century?
I currently give it something in the .1%-1% range.
For reference: My impression is that this is on the low end, relative to estimates that other people in the long-termist AI safety/governance community would give, but that it's not uniquely low. It's also, I think, more than high enough to justify a lot of work and concern.
I am curious whether you are, in general, more optimistic about x-risks [say, than Toby Ord]. What are your estimates of total and unforeseen anthropogenic risks in the next century?
What have you changed your mind about recently?
Suppose there was an operational long-term investment fund a la Phil Trammel. Where would you donate?
From the podcast transcript:
I continue to have a lot of uncertainty about how likely it is that AI development will look like "there’s this separate project of trying to figure out what goals to give these AI systems" vs a development process where capability and goals are necessarily connected. (I didn't find your arguments i
... (read more)Planned summary of the podcast episode for the Alignment Newsletter:
... (read more)I have nothing to add to the discussion but wanted to say that this was my favourite episode, which given how big a fan I am of the podcast is a very high bar.
How entrenched do you think are old ideas about AI risk in the AI safety community? Do you think that it's possible to have a new paradigm quickly given relevant arguments?
I'd guess that like most scientific endeavours, there are many social aspects that make people more biased toward their own old way of thinking. Research agendas and institutions are focused on some basic assumptions - which, if changed, could be disruptive to the people involved or the organisation. However, there seems to be a lot of engagement with the underlying questions about the paths to superintelligence and the consequences thereof, and also the research community today is heavily involved with the rationality community - both of these makes me hopeful that more minds can be changed given appropriate argumentation.
What is your theory of change for work on clarifying arguments for AI risk?
Is the focus more on immediate impact on funding/research or on the next generation? Do you feel this is important more to direct work to the most important paths or to understand how sure are we about all this AI stuff and grow the field or deprioritize it accordingly?
You say that there hasn't been much literature arguing for Sudden Emergence (the claim that AI progress will look more like the brain-in-a-box scenario than the gradual-distributed-progress scenario). I am interested in writing some things on the topic myself, but currently think it isn't decision-relevant enough to be worth prioritizing. Can you say more about the decision-relevance of this debate?
Toy example: Suppose I write something that triples everyone's credence in Sudden Emergence. How does that change what people do, in a way that makes the world better (or worse, depending on whether Sudden Emergence is true or not!)
I would be really interested in you writing on that!
It's a bit hard to say what the specific impact would be, but beliefs about the magnitude of AI risk of course play at least an implicit role in lots of career/research-focus/donation decisions within the EA community; these beliefs also affect the extent to which broad EA orgs focus on AI risk relative to other cause areas. And I think that people's beliefs about the Sudden Emergence hypothesis at least should have a large impact in their level of doominess about AI risk; I regard it as one of the biggest cruxes. So I'd at least be hopeful that, if everyone's credences in Sudden Emergence changed by a factor of three, this had some sort of impact on the portion of EA attention devoted to AI risk. I think that credences in the Sudden Emergence hypothesis should also have an impact on the kinds of risks/scenarios that people within the AI governance and safety communities focus on.
I don't, though, have a much more concrete picture of the influence pathway.
How confident are you in brief arguments for rapid and general progress outlined in the section 1.1 of GovAI's research agenda? Have the arguments been developed further?
What is your overall probability that we will, in this century, see progress in artificial intelligence that is at least as transformative as the industrial revolution?
What is your probability for the more modest claim that AI will be at least as transformative as, say, electricity or railroads?
I think this is a little tricky. The main way in which the Industrial Revolution was unusually transformative is that, over the course of the IR, there were apparently unusually large pivots in several important trendlines. Most notably, GDP-per-capita began to increase at a consistently much higher rate. In more concrete terms, though, the late nineteenth and early twentieth centuries probably included even greater technological transformations.
From David Weil's growth textbook (pg. 265-266):
... (read more)In the episode you say:
I was wondering what you think of the potential of broader attempts to influence the long-run future (e.g. promoting positive values, growing the EA movement) as opposed to the more targeted attempts to reduce x-risks that are most prominent in the EA movement.
In "Unpacking Classic Arguments for AI Risk", you defined The Process Orthogonality Thesis as: The process of imbuing a system with capabilities and the process of imbuing a system with goals are orthogonal.
Then, gave several examples of cases where this does not hold: thermostat, Deep Blue, OpenAI Five, the Human brain. Could you elaborate a bit on these examples?
I am a bit confused about it. In Deep Blue, I think that most of the progress has been general computational advances, and the application of an evaluation system given later. The human bra
... (read more)Do you still think that Robin Hanson's critique of Christiano's scenario is worth exploring in more detail?
I do think there's still more thinking to be done here, but, since I recorded the episode, Alexis Carlier and Tom Davidson have actually done some good work in response to Hanson's critique. I was pretty persuaded of their conclusion:
On a scale from 1 to 10 what would you rate The Boss Baby? :)
I actually haven't seen The Boss Baby. A few years back, this ad was on seemingly all of the buses in Oxford for a really long time. Something about them made a lasting impression on me. Maybe it was the smug look on the boss baby's face.
Reviewing it purely on priors, though, I'll give it a 3.5 :)
What priorities for TAI strategy does your skepticism towards classical work dictates? Some argued, that we have greater leverage over the scenarios with discrete/discontinuous deployment.
What writings have influenced your thinking the most?
What are the arguments that speeding up economic growth has a positive long run impact?
What do you think is the most important role people without technical/quantitative educational backgrounds can play in AI safety/governance?
Hi Ben - this episode really gave me a lot to think about! Of the 'three classic arguments' for AI X-risk you identify, I argued in a previous post that the 'discontinuity premise' is based on taking a high-level argument that should be used to establish that sufficiently capable AI will produce very fast progress too literally and assuming the 'fast progress' has to happen suddenly and in a specific AI.
Your discussion of the other two arguments led me to conclude that the same sort of mistake is at work in all of them, as I e... (read more)
Wow, I am quite surprised it took a year to produce. @80K, does it always take so long?
There's often a few months between recording and release and we've had a handful of episodes that took a frustratingly long time to get out the door, but never a year.
The time between the first recording and release for this one was actually 9 months. The main reason was Howie and Ben wanted to go back and re-record a number of parts they didn't think they got right the first time around, and it took them a while to both be free and in the same place so they could do that.
A few episodes were also pushed back so we could get out COVID-19 interviews during the peak of the epidemic.
This sounds like a status move. I asked a sincere question and maybe I didn't think too carefully when I asked it, but there's no need to rub it in.
Thanks, I appreciate the clarification! :)
Hi Ben,
You suggested in the podcast that it's not clear how to map some of the classic arguments—and especially their manifestation in thought experiments like the paper clip maximizer—to contemporary machine learning methods. I'd like to push back on that view.
Deep reinforcement learning is a popular contemporary ML approach for training agents that act in simulated and real-world environments. In deep RL, an agent is trained to maximize its reward (more precisely, the sum of discounted rewards over time steps), which perfectly fits the "agent" abstractio
... (read more)You discuss at one point in the podcast the claim that as AI systems take on larger and larger real world problems, the challenge of defining the reward function will become more and more important. For example for cleaning, the simple number-of-dust-particles objective is inadequate because we care about many other things e.g. keeping the house tidy and many side constraints e.g. avoiding damaging household objects. This isn't quite an argument for AI alignment solving itself, but it is an argument that the attention and resources poured into AI alignment
... (read more)Sorry if this isn’t as polished as I’d hoped. Still a lot to read and think about, but posting as I won’t have time now to elaborate further before the weekend. Thanks for doing the AMA!
It seems like a crux that you have identified is how “sudden emergence” happens. How would a recursive self-improvement feedback loop start? Increasing optimisation capacity is a convergent instrumental goal. But how exactly is that goal reached? To give the most pertinent example - what would the nuts and bolts of it be for it happening i... (read more)
Thoughts on modifications/improvements to The Windfall Clause?
What do you think about hardware-based forecasts for human-substitute AI?
Great interview, thanks for some really thought-provoking ideas. For the brain in the box section, it seemed like you were saying that we'd expect future worlds to have fairly uniform distributions of capabilities of AI systems, and so we'd learn from other similar cases. How uniform do you think the spread of capabilities of AI systems is now, and how wide do you think the gaps have to be in the future for the 'brain in a box' scenario to be possible?
Have your become more uncertain/optimistic about the arguments in favour of importance of other x-risks as a result of scrutinising AI risk?
From a bayesian perspective there is no particular reason why you have to provide more evidence if you provide credences, and in general I think there is a lot of value in people providing credences even if they don't provide additional evidence, if only to avoid problems of ambiguous language.
You seem to have switched from the claim that EAs often report their credences without articulating the evidence on which those credences rest, to the claim that EAs often lack evidence for the credences they report. The former claim is undoubtedly true, but it doesn't necessarily describe a problematic phenomenon. (See Greg Lewis's recent post; I'm not sure if you disagree.). The latter claim would be very worrying if true, but I don't see reason to believe that it is. Sure, EAs sometimes lack good reasons for the views they espouse, but this is a general phenomenon unrelated to the practice of reporting credences explicitly.
What are your thoughts on AI policy careers in government? I'm somewhat skeptical, for two main reasons:
1) It's not clear that governments will become leading actors in AI development; by default I expect this not to happen. Unlike with nuclear weapons, governments don't need to become experts in the technology to yield AI-based weapons; they can just purchase them from contractors. Beyond military power, competition between nations is mostly economic. Insofar as AI is an input to this, governments have an incentive to invest in domestic AI ... (read more)