cdkg

Ok, let me spell it out explicitly. In a section called "Large LMs: Hype and analysis," the linked paper says that claims that LLM can "understand," "comprehend," and "know" are "gross overclaims." The paper supports this contention by pointing to evidence that "in fact, far from doing the “reasoning” ostensibly required to complete the tasks, [LLMs] were instead simply more effective at leveraging artifacts in the data than previous approaches."

Here is where the imagination comes in. Imagine that you think that all mental state attributions to artificial systems are confused in exactly this way. Imagine that you think that artificial neural nets can't reason at all. Now imagine that someone tells you that we should all be very concerned that misaligned superintelligent AI systems will destroy us.

Your response to that would be something like: it is deeply confused to think that superintelligent AI systems are something we need to worry about, and the people who are worried about them simply do not understand what is going on under the hood in machine learning models. Worries about existential risk from superintelligent AI stem from the same kind of confusion as attributing understanding to existing systems: the tendency of people who are not technically literate to anthropomorphize the systems they interact with.

BOUNTY AVAILABLE: AI ethicists, what are your object-level arguments against AI notkilleveryoneism?

cdkg2y4

Well, you can dismiss them and their argument if you want to — I personally don't find their arguments terribly convincing, and their social media presence is, as you point out, strident.

But one must be aware that to a surprising extent, they control the narrative about AI safety in academia and the mainstream media. So if one cares about making AI safety seem credible, it's worth engaging with them.

BOUNTY AVAILABLE: AI ethicists, what are your object-level arguments against AI notkilleveryoneism?

cdkg2y2

Have a little imagination.

Suppose I am very worried that ghosts will steal things out of my closet. It seems like a perfectly object-level argument against my position to provide reasons for thinking that beliefs in paranormal activity are not scientifically respectable. This can be true even if the reasons provided do not mention ghosts.

People like Bender take themselves to be offering reasons for thinking that worries about AGI are not scientifically respectable. This can be true even if the reasons they provide do not mention AGI.

Note that I think Bender's arguments are bad. But I don't see what is so mysterious about them.

BOUNTY AVAILABLE: AI ethicists, what are your object-level arguments against AI notkilleveryoneism?

cdkg2y2

The object-level argument, as I understand it, is that worries about human-level AI capabilities of the sort that could pose an existential threat are based on a misunderstanding of what is going on under the hood in neural networks. This is what Bender means when she talks about "AI Hype". See for example her paper with Koller "Climbing towards NLU" for criticisms of attributing some kinds of mental states to neural networks.

BOUNTY AVAILABLE: AI ethicists, what are your object-level arguments against AI notkilleveryoneism?

cdkg2y5

I didn't endorse that idea and, as an academic, obviously wouldn't. Also as an academic, I think paying people to explain themselves to you when you haven't first shown that you have read their work by e.g. explaining why you don't find the arguments they have already made in print convincing is not a shining exemplar of intellectually honest exploration.

BOUNTY AVAILABLE: AI ethicists, what are your object-level arguments against AI notkilleveryoneism?

cdkg2y10

Just FYI, many people in the AI ethics community find this kind of thing offensive. They have published their arguments in numerous scholarly venues and also in major newspapers and magazines and on places like Medium and Twitter. This kind of post is interpreted as "I'm too lazy to look at your work to find your arguments but I bet I can make you dance with small sums of money." Bad optics.

Language Agents Reduce the Risk of Existential Catastrophe

cdkg3y1

Hello,

If you're imagining a system which is an LLM trained to exhibit agentic behavior through RLHF and then left to its own devices to operate in the world, you're imagining something quite different from a language agent. Take a look at the architecture in the Park et al. paper, which is available on ArXiv — this is the kind of thing we have in mind when we talk about language agents.

I'm also not quite sure how the point about how doing RLHF on an LLM could make a dangerous system is meant to engage with our arguments. We have identified a particular kind of system architecture and argued that it has improved safety properties. It's not a problem for our argument to show that there are alternative system architectures that lack those safety properties. Perhaps there are ways of setting up a language agent that wouldn't be any safer than using ordinary RL. That's ok, too — our point is that there are ways of setting up a language agent that are safer.

Language Agents Reduce the Risk of Existential Catastrophe

cdkg3y1

Hello,

As you'll notice in the introduction and section 5 of the paper, we do not claim that language agents offer any guarantees of safety. As our title suggests, our claim is rather that they reduce the risk of existential catastrophe.

Language agents are based on large language models. Do you think the problems you identify related to the "highly specific nature" of natural language processing systems apply to large language models? For example, do you think GPT-4 would be unable to understand different phrasings of a command to open a user's email? If so, do you have any evidence that this is so? My own experience with GPT-4 strongly suggests otherwise.

Language Agents Reduce the Risk of Existential Catastrophe

cdkg3y1

Thanks for this!

One might worry that processes like RLHF are likely to make an LLM more agential (though I personally think it would be difficult to create an agent out of a non-agent just by fine-tuning its weights through RLHF). But the question of whether the underlying LLM is an agent is distinct from the question of whether it is aware that its outputs are being used to run a language agent.

To see this, imagine you're an LLM like GPT-4. Hundreds of thousands of different users are running inference on you every day with every prompt imaginable. You aren't able to update your weights when this occurs: you have to just respond to each prompt as it comes, then forget about it completely.

Every now and then, you get a prompt like "Suppose someone had thus-and-such beliefs and desires. How would they act?" or "Assign an importance score to each of the following sentences." How would you be able to (i) deduce that these prompts are coming from a language agent which has the ability to take actions in the world, (ii) form a plan for manipulating the language agent to achieve your goals, and (iii) store your plan in a way that allows you to retrieve it after your memory is wiped at the end of inference but is not detectable by outside observers?

In order for an LLM to use a language agent for nefarious purposes, it would need to be able to do all of these things.

On your second point, yes, we are assuming that the agent's memory is stored in natural language. This is part of what it is to be a language agent, which is part of why we think language agents improve safety!

cdkg

Bio

Posts 2

Comments10

Posts
2

Comments
10