IMO it is harmful on expectation for a technical safety researcher to work at DeepMind, OpenAI or Anthropic.
Four reasons:
- Interactive complexity. The intractability of catching up – by trying to invent general methods for AI corporations to somehow safely contain model interactions, as other engineers scale models' combinatorial complexity and outside connectivity.
- Safety-capability entanglements
- Commercialisation. Model inspection and alignment techniques can support engineering and productisation of more generally useful automated systems.
- Infohazards. Researching capability risks within an AI lab can inspire researchers hearing about your findings to build new capabilities.
- Shifts under competitive pressure
- DeepMind merged with Google Brain to do commercialisable research,
OpenAI set up a company and partnered with Microsoft to release ChatGPT,
Anthropic pitched to investors they'd build a model 10 times more capable. - If you are an employee at one of these corporations, higher-ups can instruct you to do R&D you never signed up to do.[1] You can abide, or get fired.
- Working long hours surrounded by others paid like you are, by a for-profit corp, is bad for maintaining bearings and your epistemics on safety.[2]
- DeepMind merged with Google Brain to do commercialisable research,
- Safety-washing. Looking serious about 'safety' helps labs to recruit idealistic capability researchers, lobby politicians, and market to consumers.
- 'let's build AI to superalign AI'
- 'look, pretty visualisations of what's going on inside AI'
This is my view. I would want people to engage with the different arguments, and think for themselves what ensures that future AI systems are actually safe.
- ^
I heard via via that Google managers are forcing DeepMind safety researchers to shift some of their hours to developing Gemini for product-ready launch.
I cannot confirm whether that's correct. - ^
For example, I was in contact with a safety researcher at an AGI lab who kindly offered to read my comprehensive outline on the AGI control problem, to consider whether to share with colleagues. They also said they're low energy. They suggested I'd remind them later, and I did, but they never got back to me. They're simply too busy it seems.
Do you think it would be better if no one who worked at OpenAI / Anthropic / Deepmind worked on safety? If those organizations devoted less of their budget to safety? (Or do you think we should want them to hire for those roles, but hire less capable or less worried people, so individuals should avoid potentially increasing the pool of talent from which they can hire?)
(Let me get back on this when I find time, hopefully tomorrow)
It depends on what you mean with 'work on safety'.
Standard practice for designing machine products to be safe in other established industries is to first narrowly scope the machinery's uses, the context of use, and the user group.
If employees worked at OpenAI / Anthropic / Deepmind on narrowing their operational scopes, all power to them! That would certainly help. It seems that leadership, who aim to design unscoped automated machinery to be used everywhere for everyone, would not approve though.
If working on safety means in effect playing close to a ceremonial role, where even though you really want to help, you cannot hope to catch up with the scaling efforts, I would reconsider. In other industries, when conscientious employees notice engineering malpractices that are already causing harms across society, sometimes one of them has the guts to find an attorney and become a whistleblower.
Also, in that case, I would prefer the AGI labs to not hire for those close-to-ceremonial roles.
I'd prefer them to be bluntly transparent to people in society that they are recklessly scaling ahead, and that they are just adding local bandaids to the 'Shoggoth' machinery.
Not that that is going to happen anyway.
If AGI labs can devote their budget to constructing operational design domains, I'm all up.
Again, that's counter to the leaders' intentions. Their intention is to scale everywhere and rely on the long-term safety researchers to tell them that there must be some yet-undiscovered general safe control patch.
I think we should avoid promoting AGI labs as a place to work at, or a place that somehow will improve safety. One of the reasons is indeed that I want us to be clear to idealistic talented people that they should really reconsider investing their career into supporting such an organisation.
BTW, I'm not quite answering from your suggested perspective of what an AGI lab "should do".
What feels relevant to me is what we can personally consider to do – as individuals connected into larger communities – so things won't get even worse.
I think I agree that safety researchers should prefer not to take a purely ceremonial role at a big company if they have other good options, but I'm hesitant to conclude that no one should be willing to do it. I don't think it is remotely obvious that safety research at big companies is ceremonial.
There are a few reasons why some people might opt for a ceremonial role:
It is good for some AI safety researchers to have access to what is going on at top labs, even if they can't do anything about it. They can at least keep tabs on it and can use that experience later in their careers.
It seems bad to isolate capabilities researchers from safety concerns. I bet capabilities researchers would take safety concerns more seriously if they eat lunch every day with someone who is worried than if they only talk to each other.
If labs do engage in behavior that is flagrantly reckless, employees can act as whistleblowers. Non-employees can't. Even if they can't prevent a disaster, they can create a paper trail of internal concerns which could be valuable in the future.
Internal politics might change and it seems better to have people in place already thinking about these things.
This is the crux for me.
If some employees actually have the guts to whistleblow on current engineering malpractices, I have some hope left that having AI safety researchers at these labs still turns out “net good”.
If this doesn’t happen, then they can keep having conversations about x-risks with their colleagues, but I don’t quite see when they will put up a resistance to dangerous tech scaling. If not now, when?
We’ve seen in which directions internal politics change, as under competitive pressures.
Nerdy intellectual researchers can wait that out as much as they like. That would confirm my concern here.
Plenty of concrete practices you can whistleblow on that will be effective in getting society to turn against these companies:
Pick what you’re in a position to whistleblow on.
Be very careful to prepare well. You’re exposing a multi-billion-dollar company. First meet in person with an attorney experienced in protecting whistleblowers.
Once you start collecting information, make photographs with your personal phone, rather than screenshots or USB copies that might be tracked by software. Make sure you’re not in line of sight of an office camera or webcam. Etc. Etc.
Preferably, before you start, talk with an experienced whistleblower about how to maintain anonymity. The more at ease you are there, the more you can bide your time, carefully collecting and storing information.
If you need information to get started, email me at remmelt.ellen[a/}protonmail<d0t>com.
~ ~ ~
But don’t wait it out until you can see some concrete dependable sign of “extinction risk”. By that time, it’s too late.
80,000 Hours handpicks jobs at AGI labs.
Some of those jobs don't even focus on safety – instead they look like policy lobbying roles or engineering support roles.
Nine months ago, I wrote my concerns to 80k staff:
80k removed one of the positions I flagged:
Software Engineer, Full-Stack, Human Data Team (reason given: it looked potentially more capabilities-focused than the original job posting that came into their system).
For the rest, little has changed:
research engineer product, prompt engineer, IT support, senior software engineer.
Two others in AI Safety also discovered the questionable job listings. They are disappointed in 80k.
Feeling exasperated about this. Thinking of putting out another post just to discuss this issue.
Hi Remmelt,
Thanks for sharing your concerns, both with us privately and here on the forum. These are tricky issues and we expect people to disagree about how to about how to weigh all the considerations — so it’s really good to have open conversations about them.
Ultimately, we disagree with you that it's net harmful to do technical safety research at AGI labs. In fact, we think it can be the best career step for some of our readers to work in labs, even in non-safety roles. That’s the core reason why we list these roles on our job board.
We argue for this position extensively in my article on the topic (and we only list roles consistent with the considerations in that article).
Some other things we’ve published on this topic in the last year or so:
Benjamin
Hi Remmelt,
Just following up on this — I agree with Benjamin’s message above, but I want to add that we actually did add links to the “working at an AI lab” article in the org descriptions for leading AI companies after we published that article last June.
It turns out that a few weeks ago the links to these got accidentally removed when making some related changes in Airtable, and we didn’t notice these were missing — thanks for bringing this to our attention. We’ve added these back in and think they give good context for job board users, and we’re certainly happy for more people to read our articles.
We also decided to remove the prompt engineer / librarian role from the job board, since we concluded it’s not above the current bar for inclusion. I don’t expect everyone will always agree with the judgement calls we make about these decisions, but we take them seriously, and we think it’s important for people to think critically about their career choices.
Hi Conor,
Thank you.
I’m glad to see that you already linked to clarifications before. And that you gracefully took the feedback, and removed the prompt engineer role. I feel grateful for your openness here.
It makes me feel less like I’m hitting a brick wall. We can have more of a conversation.
~ ~ ~
The rest is addressed to people on the team, and not to you in particular:
There are grounded reasons why 80k’s approaches to recommending work at AGI labs – with the hope of steering their trajectory – has supported AI corporations to scale. While disabling efforts that may actually prevent AI-induced extinction.
This concerns work on your listed #1 most pressing problem. It is a crucial consideration that can flip your perceived total impact from positive to negative.
I noticed that 80k staff responses so far started by stating disagreement (with my view), or agreement (with a colleague’s view).
This doesn’t do discussion of it justice. It’s like responding to someone’s explicit reasons for concern that they must be “less optimistic about alignment”. This ends reasoned conversations, rather than opens them up.
Something I would like to see more of is individual 80k staff engaging with the reasoning.
Ben, it is very questionable that 80k is promoting non-safety roles at AGI labs as 'career steps'.
Consider that your model of this situation may be wrong (account for model error).
I did read that compilation of advice, and responded to that in an email (16 May 2023):
"Dear [a],
People will drop in and look at job profiles without reading your other materials on the website. I'd suggest just writing a do-your-research cautionary line about OpenAI and Anthropic in the job descriptions itself.
Also suggest reviewing whether to trust advice on whether to take jobs that contribute to capability research.
Totally up to you of course.
Warm regards,
Remmelt"
This is what the article says:
"All that said, we think it’s crucial to take an enormous amount of care before working at an organisation that might be a huge force for harm. Overall, it’s complicated to assess whether it’s good to work at a leading AI lab — and it’ll vary from person to person, and role to role."
So you are saying that people are making a decision about working for an AGI lab that might be (or actually is) a huge force for harm. And that whether it's good (or bad) to work at an AGI lab depends on the person – ie. people need to figure this out for them personally.
Yet you are openly advertising various jobs at AGI labs on the job board. People are clicking through and applying. Do you know how many read your article beforehand?
~ ~ ~
Even if they did read through the article, both the content and framing of the advice seems misguided. Noticing what is emphasised in your considerations.
Here are the first sentences of each consideration section:
(ie. as what readers are most likely to read, and what you might most want to convey).
"Labs also often don’t have enough staff... to figure out what they should be lobbying governments for (we’d guess that many of the top labs would lobby for things that reduce existential risks)."
~ ~ ~
After that, there is a new section titled "How can you mitigate the downsides of this option?"
"This distinction between ‘capabilities’ research and ‘safety’ research is extremely fuzzy, and we have a somewhat poor track record of predicting which areas of research will be beneficial for safety work in the future. This suggests that work that advances some (and perhaps many) kinds of capabilities faster may be useful for reducing risks."
This seems like a absurd claim. Are 80k actually making it?
EDIT: the claim is made by Benjamin Hilton, one of 80k's analysts and the person the OP is replying too.
It is an extreme claim to make in that context, IMO.
I think Benjamin made it to be nuanced. But the nuance in that article is rather one-sided.
If anything, the nuance should be on the side of identifying any ways you might accidentally support the development of dangerous auto-scaling technologies.
First do, no harm.
Note that we are focussing here on decisions at the individual level.
There are limitations to that.
See my LessWrong comment.
I would agree with Remmelt here. While upskilling people is helpful, if those people then go on to increase the rate of capabilities gain by AI companies, this is reducing the time the world has available to find solutions to alignment and AI regulation.
While, as a rule, I don't disagree with an industries increasing their capabilities, I do disagree with this when those capabilities knowingly lead to human extinction.
I think 1 and 3 seem like arguments that reduce the desirability of these roles but it's hard to see how they can make them net-negative.
Arguments 4 and to some extent 2 give a real case that could in principle make something net-negative but I'm sceptical that the effect scales that far. In particular if this were right, I think it would effectively say that it would be better if AI labs invested less rather than more in safety. I can't rule out that that's correct, but it seems like a pretty galaxy-brained take and I would want some robust arguments before I took it seriously, and I don't think these are close to meeting that threshold for me personally.
Further, I think that there are a bunch of arguments for the value of safety work within labs (e.g. access to sota models; building institutional capacity and learning; cultural outreach) which seem to me to be significant and you're not engaging with.
Yes, specifically by claim 1, positive value can only asymptotically approach 0
(ignoring opportunity costs).
Some relevant aspects are missing in what you shared so far.
Particularly, we need to consider that any one AGI lab is (as of now) beholden to the rest of society to continue operating.
This is clearly true in the limit. Imagine some freak mass catastrophe caused by OpenAI:
staff would leave, consumers would stop buying, and regulators would shut the place down.
But it is also true in practice.
From the outside, these AGI labs may look like institutional pillars of strength. But from the inside, management is constantly jostling, trying to source enough investments and/or profitable productisation avenues to cover high staff salaries and compute costs. This is why I think DeepMind allowed themselves to be acquired by Google in the first place. They ran a $649 million loss in 2019, and could simply not maintain that burn rate without a larger tech corporation covering their losses for them.
In practice, AGI labs are constantly finding ways to make themselves look serious about safety, and finding ways to address safety issues customers are noticing. Not just because some employees there are paying attention to those harms and taking care to avoid them. But also because they're dealing with newly introduced AI products that already have lots of controversies associated to it (in these rough categories: data laundering, worker exploitation, design errors and misuses, resource-intensive and polluting hardware).
If we think about this in simplified dimensions:
Outside stakeholders would need to perceive the system to be unsafe to restrict further scaling and/or uses (which IMO is much more effective than trying to make scaled open-ended systems comprehensively safe after the fact). Where 'the system' can include the institutional hierarchies and infrastructure through which an AI models is developed and deployed.
Corporations have a knack for finding ways to hide product harms, while influencing people to not notice or to dismiss those harms. See cases Big Tabacco, Big Pharma, Big Oil.
Corporations that manage to do that can make profit from selling products without getting shut down. This is what capitalism – open market transactions and private profit reinvestment – in part selects for. This is what Big Tech companies that win out over time manage to do.
(it feels like I'm repeating stuff obvious to you, but it bears repeating to set the context)
Are you stating an intuition that it would be surprising if AGI labs invested less in improving actual safety, then that would be overall less harmful?
I am saying with claim 4. that there is another dimension, perceived safety.
The more that an AI corporation is able to make the system be or at least look *locally* safe to users and other stakeholders (even if globally much more unsafe), the more the rest of society will permit and support the corporations to scale on. And the more that the AI corporation can promote that they are responsibly scaling toward some future aligned system that is *globally* safe, the more that nerdy researchers and other stakeholders open to that kind of messaging can treat that as a sign of virtue and give the corporation a pass there too.
And that unfortunately, by claim 1, that actual safety is intractable when scaling such open-ended (and increasingly automated) systems. Which is why in established safety-critical industries – eg. for medical devices, cars, planes, industrial plants, even kitchen devices – there are best practices for narrowly scoping the design of the machines to specific uses and contexts of use.
So actual safety is intractable for such open-ended systems, but AI corporations can and do disproportionately support research and research communication that increases perceived safety.
But actual safety is tractable for restricting corporate AI scaling (if you reduce the system's degrees of freedom of interaction, you reduce the possible ways things can go wrong). Unfortunately, fewer people move to restrict corporate-AI scaling if the corporate activities are perceived to be safe.
By researching safety at AGI labs, researchers are therefore predominantly increasing perceived safety, and as a result closing off realistic opportunities to improving actual safety.
Thanks. I'm now understanding your central argument to be:
Is that a fair summary?
If so, I think:
Thanks, I appreciate the paraphrase. Yes, that is a great summary.
I hear this all the time, but I also notice that people saying it have not investigated the fundamental limits to controllability that you would encounter with any control system.
As a philosopher, would you not want to have a more generalisable and robust argument that this is actually going to work out?
I'm curious about the pathways you have in mind. I may have missed something here.
I'm skeptical that that would work in this corporate context.
"Capabilities" are just too useful economically and can creep up on you. Putting aside whether we can even measure comprehensively enough for "dangerous capabilities".
In the meantime, it's great marketing to clients, to the media, and to national interests:
You are working on AI systems that could become so capable, that you even have an entire team devoted to capabilities monitoring.
This is interesting. And a fair argument. Will think about this.
I think it's basically things flowing in some form through "the people working on the powerful technology spend time with people seriously concerned with large-scale risks". From a very zoomed out perspective it just seems obvious that we should be more optimistic about worlds where that's happening compared to worlds where it's not (which doesn't mean that necessarily remains true when we zoom in, but it sure affects my priors).
If I try to tell more concrete stories they include things of the form "the safety-concerned people have better situational awareness and may make better plans later", and also "when systems start showing troubling indicators, culturally that's taken much more seriously". (Ok, I'm not going super concrete in my stories here, but that's because I don't want to anchor things on a particular narrow pathway.)
Thanks for clarifying.
Of course I'd prefer to have something more robust. But I don't think the lack of that means it's necessarily useless.
I don't think control is likely to scale to arbitrarily powerful systems. But it may not need to. I think the next phase of the problem is like "keep things safe for long enough that we can get important work out of AI systems", where the important work has to be enough that it can be leveraged to something which sets us up well for the following phases.
Under the concept of 'control', I am including the capacity of the AI system to control their own components' effects.
I am talking about fundamental workings of control. Ie. control theory and cybernetics.
I.e. as general enough that results are applicable to any following phases as well.
Anders Sandberg has been digging lately into fundamental controllability limits.
Could be interesting to talk with Anders.
I would agree that this is a good summary:
If perception of safety is higher than actual safety, it will lead to underinvestment of future safety, which increases the probability of failure of the system.
Let's dig into the arguments you mentioned then.
It would be great if safety researchers at AGI labs start connecting to collaborate effectively on restricting harmful scaling.
I'm going off the brief descriptions you gave.
Does that cover the arguments as you meant them? What did I miss?
Of the four reasons you listed, reason 4 (safety washing) seems the most important. Safety-washing, alongside the related ethics-washing and green-washing are effective techniques that industry uses to increase peoples perception of the industry. Lizka wrote a post on this. These techniques are used by many industries, particularly by industries that produce significant externalities such as the oil industry. These techniques are used because they work, because they give people an out. It is easier to think about the shiny flowers on an ad than it is to think about the reality of an industry killing people.
Safety-washing of AI is harmful as it gives people an out, a chance to repeat the line "well at least they are allegedly doing some safety stuff", which is a convenient distraction from the fact that AI labs are knowingly developing a technology that can cause human extinction. This distraction causes otherwise safety-conscious people to invest in or work in an industry that they would reconsider if they had access to all the information. By pointing out this distraction, we can help people make more informed decisions.
Very much agreed.
Yes, I think this is a very useful phenomenon to point at, and some people have a very naïve understanding of what these labs do, especially technical AI safety researchers that have a technical background where skills of critical thinking have not been at the heart of their education. I heard a lot of very candid remarks about the political influence carried out by these labs, and I am worried that these researchers lack a more global understanding of the effects of their work.
Given OpenAI's recent updates on military bans and transparency of documents, I find myself more and more cautious when it comes to trusting anyone working on AI safety. I would love to see representatives of these labs addressing the concerns raised in this post in a credible way.