Spreading messages to help with the most important century

Holden Karnofsky

This is a linkpost for https://www.cold-takes.com/spreading-messages-to-help-with-the-most-important-century/

In the most important century series, I argued that the 21st century could be the most important century ever for humanity, via the development of advanced AI systems that could dramatically speed up scientific and technological advancement, getting us more quickly than most people imagine to a deeply unfamiliar future.

In this more recent series, I’ve been trying to help answer this question: “So what? What can I do to help?”

So far, I’ve just been trying to build a picture of some of the major risks we might face (especially the risk of misaligned AI that could defeat all of humanity), what might be challenging about these risks, and why we might succeed anyway. Now I’ve finally gotten to the part where I can start laying out tangible ideas for how to help (beyond the pretty lame suggestions I gave before).

This piece is about one broad way to help: spreading messages that ought to be more widely understood.

One reason I think this topic is worth a whole piece is that practically everyone can help with spreading messages at least some, via things like talking to friends; writing explanations of your own that will appeal to particular people; and, yes, posting to Facebook and Twitter and all of that. Call it slacktivism if you want, but I’d guess it can be a big deal: many extremely important AI-related ideas are understood by vanishingly small numbers of people, and a bit more awareness could snowball. Especially because these topics often feel too “weird” for people to feel comfortable talking about them! Engaging in credible, reasonable ways could contribute to an overall background sense that it’s OK to take these ideas seriously.

And then there are a lot of potential readers who might have special opportunities to spread messages. Maybe they are professional communicators (journalists, bloggers, TV writers, novelists, TikTokers, etc.), maybe they’re non-professionals who still have sizable audiences (e.g., on Twitter), maybe they have unusual personal and professional networks, etc. Overall, the more you feel you are good at communicating with some important audience (even a small one), the more this post is for you.

That said, I’m not excited about blasting around hyper-simplified messages. As I hope this series has shown, the challenges that could lie ahead of us are complex and daunting, and shouting stuff like “AI is the biggest deal ever!” or “AI development should be illegal!” could do more harm than good (if only by associating important ideas with being annoying). Relatedly, I think it’s generally not good enough to spread the most broad/relatable/easy-to-agree-to version of each key idea, like “AI systems could harm society.” Some of the unintuitive details are crucial.

Instead, the gauntlet I’m throwing is: “find ways to help people understand the core parts of the challenges we might face, in as much detail as is feasible.” That is: the goal is to try to help people get to the point where they could maintain a reasonable position in a detailed back-and-forth, not just to get them to repeat a few words or nod along to a high-level take like “AI safety is important.” This is a lot harder than shouting “AI is the biggest deal ever!”, but I think it’s worth it, so I’m encouraging people to rise to the challenge and stretch their communication skills.

Below, I will:

Outline some general challenges of this sort of message-spreading.
Go through some ideas I think it’s risky to spread too far, at least in isolation.
Go through some of the ideas I’d be most excited to see spread.
Talk a little bit about how to spread ideas - but this is mostly up to you.

Challenges of AI-related messages

Here’s a simplified story for how spreading messages could go badly.

You’re trying to convince your friend to care more about AI risk.
You’re planning to argue: (a) AI could be really powerful and important within our lifetimes; (b) Building AI too quickly/incautiously could be dangerous.
- Your friend just isn’t going to care about (b) if they aren’t sold on some version of (a). So you’re starting with (a).
Unfortunately, (a) is easier to understand than (b). So you end up convincing your friend of (a), and not (yet) (b).
Your friend announces, “Aha - I see that AI could be tremendously powerful and important! I need to make sure that people/countries I like are first to build it!” and runs off to help build powerful AI as fast as possible. They’ve chosen the competition frame (“will the right or the wrong people build powerful AI first?”) over the caution frame (“will we screw things up and all lose?”), because the competition frame is easier to understand.
Why is this bad? See previous pieces on the importance of caution.

(Click to expand) More on the “competition” frame vs. the “caution” frame”

In a previous piece, I talked about two contrasting frames for how to make the best of the most important century:

The caution frame. This frame emphasizes that a furious race to develop powerful AI could end up making everyone worse off. This could be via: (a) AI forming dangerous goals of its own and defeating humanity entirely; (b) humans racing to gain power and resources and “lock in” their values.

Ideally, everyone with the potential to build something powerful enough AI would be able to pour energy into building something safe (not misaligned), and carefully planning out (and negotiating with others on) how to roll it out, without a rush or a race. With this in mind, perhaps we should be doing things like:

Working to improve trust and cooperation between major world powers. Perhaps via AI-centric versions of Pugwash (an international conference aimed at reducing the risk of military conflict), perhaps by pushing back against hawkish foreign relations moves.
Discouraging governments and investors from shoveling money into AI research, encouraging AI labs to thoroughly consider the implications of their research before publishing it or scaling it up, working toward standards and monitoring, etc. Slowing things down in this manner could buy more time to do research on avoiding misaligned AI, more time to build trust and cooperation mechanisms, and more time to generally gain strategic clarity

The “competition” frame. This frame focuses less on how the transition to a radically different future happens, and more on who's making the key decisions as it happens.

If something like PASTA is developed primarily (or first) in country X, then the government of country X could be making a lot of crucial decisions about whether and how to regulate a potential explosion of new technologies.
In addition, the people and organizations leading the way on AI and other technology advancement at that time could be especially influential in such decisions.

This means it could matter enormously "who leads the way on transformative AI" - which country or countries, which people or organizations.

Some people feel that we can make confident statements today about which specific countries, and/or which people and organizations, we should hope lead the way on transformative AI. These people might advocate for actions like:

Increasing the odds that the first PASTA systems are built in countries that are e.g. less authoritarian, which could mean e.g. pushing for more investment and attention to AI development in these countries.
Supporting and trying to speed up AI labs run by people who are likely to make wise decisions (about things like how to engage with governments, what AI systems to publish and deploy vs. keep secret, etc.)

Tension between the two frames. People who take the "caution" frame and people who take the "competition" frame often favor very different, even contradictory actions. Actions that look important to people in one frame often look actively harmful to people in the other.

For example, people in the "competition" frame often favor moving forward as fast as possible on developing more powerful AI systems; for people in the "caution" frame, haste is one of the main things to avoid. People in the "competition" frame often favor adversarial foreign relations, while people in the "caution" frame often want foreign relations to be more cooperative.

That said, this dichotomy is a simplification. Many people - including myself - resonate with both frames. But I have a general fear that the “competition” frame is going to be overrated by default for a number of reasons, as I discuss here.

Unfortunately, I’ve seen something like the above story play out in multiple significant instances (though I shouldn’t give specific examples).

And I’m especially worried about this dynamic when it comes to people in and around governments (especially in national security communities), because I perceive governmental culture as particularly obsessed with staying ahead of other countries (“If AI is dangerous, we’ve gotta build it first”) and comparatively uninterested in things that are dangerous for our country because they’re dangerous for the whole world at once (“Maybe we should worry a lot about pandemics?”) ^[1]

You could even argue (although I wouldn’t agree!^[2] that to date, efforts to “raise awareness” about the dangers of AI have done more harm than good (via causing increased investment in AI, generally).

So it’s tempting to simply give up on the whole endeavor - to stay away from message spreading entirely, beyond people you know well and/or are pretty sure will internalize the important details. But I think we can do better.

This post is aimed at people who are good at communicating with at least some audience. This could be because of their skills, or their relationships, or some combination. In general, I’d expect to have more success with people who hear from you a lot (because they’re your friend, or they follow you on Twitter or Substack, etc.) than with people you reach via some viral blast of memery - but maybe you’re skilled enough to make the latter work too, which would be awesome. I'm asking communicators to hit a high bar: leave people with strong understanding, rather than just getting them to repeat a few sentences about AI risk.

Messages that seem risky to spread in isolation

First, here are a couple of messages that I’d rather people didn’t spread (or at least have mixed feelings about spreading) in isolation, i.e., without serious efforts to include some of the other messages I cover below.

One category is messages that generically emphasize the importance and potential imminence of powerful AI systems. The reason for this is in the previous section: many people seem to react to these ideas (especially when unaccompanied by some other key ones) with a “We’d better build powerful AI as fast as possible, before others do” attitude. (If you’re curious about why I wrote The Most Important Century anyway, see footnote for my thinking.)^[3]

Another category is messages that emphasize that AI could be risky/dangerous to the world, without much effort to fill in how, or with an emphasis on easy-to-understand risks.

Since “dangerous” tends to imply “powerful and important,” I think there are similar risks to the previous section.
If people have a bad model of how and why AI could be risky/dangerous (missing key risks and difficulties), they might be too quick to later say things like “Oh, turns out this danger is less bad than I thought, let’s go full speed ahead!” Below, I outline how misleading “progress” could lead to premature dismissal of the risks.

Messages that seem important and helpful (and right!)

We should worry about conflict between misaligned AI and all humans

Unlike the messages discussed in the previous section, this one directly highlights why it might not be a good idea to rush forward with building AI oneself.

The idea that an AI could harm the same humans who build it has very different implications from the idea that AI could be generically dangerous/powerful. Less “We’d better get there before others,” more “there’s a case for moving slowly and working together here.”

The idea that AI could be a problem for the same people who build it is common in fictional portrayals of AI (HAL 9000, Skynet, The Matrix, Ex Machina) - maybe too much so? It seems to me that people tend to balk at the “sci-fi” feel, and what’s needed is more recognition that this is a serious, real-world concern.

The main pieces in this series making this case are Why would AI “aim” to defeat humanity? and AI could defeat all of us combined. There are many other pieces on the alignment problem (see list here); also see Matt Yglesias's case for specifically embracing the “Terminator”/Skynet analogy.

I’d be especially excited for people to spread messages that help others understand - at a mechanistic level - how and why AI systems could end up with dangerous goals of their own, deceptive behavior, etc. I worry that by default, the concern sounds like lazy anthropomorphism (thinking of AIs just like humans).

Transmitting ideas about the “how and why” is a lot harder than getting people to nod along to “AI could be dangerous.” I think there’s a lot of effort that could be put into simple, understandable yet relatable metaphors/analogies/examples (my pieces make some effort in this direction, but there’s tons of room for more).

AIs could behave deceptively, so “evidence of safety” might be misleading

I’m very worried about a sequence of events like:

As AI systems become more powerful, there are some concerning incidents, and widespread concern about “AI risk” grows.
But over time, AI systems are “better trained” - e.g., given reinforcement to stop them from behaving in unintended ways - and so the concerning incidents become less common.
Because of this, concern dissipates, and it’s widely believed that AI safety has been “solved.”
But what’s actually happened is that the “better training” has caused AI systems to behave deceptively - to appear benign in most situations, and to cause trouble only when (a) this wouldn’t be detected or (b) humans can be overpowered entirely.

I worry about AI systems’ being deceptive in the same way a human might: going through chains of reasoning like “If I do X, I might get caught, but if I do Y, no one will notice until it’s too late.” But it can be hard to get this concern taken seriously, because it means attributing behavior to AI systems that we currently associate exclusively with humans (today’s AI systems don’t really do things like this^[4].

One of the central things I’ve tried to spell out in this series is why an AI system might engage in this sort of systematic deception, despite being very unlike humans (and not necessarily having e.g. emotions). It’s a major focus of both of these pieces from this series:

Whether this point is widely understood seems quite crucial to me. We might end up in a situation where (a) there are big commercial and military incentives to rush ahead with AI development; (b) we have what seems like a set of reassuring experiments and observations.

At that point, it could be key whether people are asking tough questions about the many ways in which “evidence of AI safety” could be misleading, which I discussed at length in AI Safety Seems Hard to Measure.

(Click to expand) Why AI safety could be hard to measure

In previous pieces, I argued that:

If we develop powerful AIs via ambitious use of the “black-box trial-and-error” common in AI development today, then there’s a substantial risk that:

These AIs will develop unintended aims (states of the world they make calculations and plans toward, as a chess-playing AI "aims" for checkmate);
These AIs could deceive, manipulate, and even take over the world from humans entirely as needed to achieve those aims.
People today are doing AI safety research to prevent this outcome, but such research has a number of deep difficulties:

“Great news - I’ve tested this AI and it looks safe.” Why might we still have a problem?
Problem	Key question	Explanation
The Lance Armstrong problem	Did we get the AI to be actually safe or good at hiding its dangerous actions?	When dealing with an intelligent agent, it’s hard to tell the difference between “behaving well” and “appearing to behave well.” When professional cycling was cracking down on performance-enhancing drugs, Lance Armstrong was very successful and seemed to be unusually “clean.” It later came out that he had been using drugs with an unusually sophisticated operation for concealing them.
The King Lear problem	The AI is (actually) well-behaved when humans are in control. Will this transfer to when AIs are in control?	It's hard to know how someone will behave when they have power over you, based only on observing how they behave when they don't. AIs might behave as intended as long as humans are in control - but at some future point, AI systems might be capable and widespread enough to have opportunities to take control of the world entirely. It's hard to know whether they'll take these opportunities, and we can't exactly run a clean test of the situation. Like King Lear trying to decide how much power to give each of his daughters before abdicating the throne.
The lab mice problem	Today's "subhuman" AIs are safe.What about future AIs with more human-like abilities?	Today's AI systems aren't advanced enough to exhibit the basic behaviors we want to study, such as deceiving and manipulating humans. Like trying to study medicine in humans by experimenting only on lab mice.
The first contact problem	Imagine that tomorrow's "human-like" AIs are safe. How will things go when AIs have capabilities far beyond humans'?	AI systems might (collectively) become vastly more capable than humans, and it's ... just really hard to have any idea what that's going to be like. As far as we know, there has never before been anything in the galaxy that's vastly more capable than humans in the relevant ways! No matter what we come up with to solve the first three problems, we can't be too confident that it'll keep working if AI advances (or just proliferates) a lot more. Like trying to plan for first contact with extraterrestrials (this barely feels like an analogy).

An analogy that incorporates these challenges is Ajeya Cotra’s “young businessperson” analogy:

Imagine you are an eight-year-old whose parents left you a $1 trillion company and no trusted adult to serve as your guide to the world. You must hire a smart adult to run your company as CEO, handle your life the way that a parent would (e.g. decide your school, where you’ll live, when you need to go to the dentist), and administer your vast wealth (e.g. decide where you’ll invest your money).

You have to hire these grownups based on a work trial or interview you come up with -- you don't get to see any resumes, don't get to do reference checks, etc. Because you're so rich, tons of people apply for all sorts of reasons. (More)

If your applicants are a mix of "saints" (people who genuinely want to help), "sycophants" (people who just want to make you happy in the short run, even when this is to your long-term detriment) and "schemers" (people who want to siphon off your wealth and power for themselves), how do you - an eight-year-old - tell the difference?

More: AI safety seems hard to measure

AI projects should establish and demonstrate safety (and potentially comply with safety standards) before deploying powerful systems

I’ve written about the benefits we might get from “safety standards." The idea is that AI projects should not deploy systems that pose too much risk to the world, as evaluated by a systematic evaluation regime: AI systems could be audited to see whether they are safe. I've outlined how AI projects might self-regulate by publicly committing to having their systems audited (and not deploying dangerous ones), and how governments could enforce safety standards both nationally and internationally.

Today, development of safety standards is in its infancy. But over time, I think it could matter a lot how much pressure AI projects are under to meet safety standards. And I think it’s not too early, today, to start spreading the message that AI projects shouldn’t unilaterally decide to put potentially dangerous systems out in the world; the burden should be on them to demonstrate and establish safety before doing so.

(Click to expand) How standards might be established and become national or international

I previously laid out a possible vision on this front, which I’ll give a slightly modified version of here:

Today’s leading AI companies could self-regulate by committing not to build or deploy a system that they can’t convincingly demonstrate is safe (e.g., see Google’s 2018 statement, "We will not design or deploy AI in weapons or other technologies whose principal purpose or implementation is to cause or directly facilitate injury to people”).
- Even if some people at the companies would like to deploy unsafe systems, it could be hard to pull this off once the company has committed not to.
- Even if there’s a lot of room for judgment in what it means to demonstrate an AI system is safe, having agreed in advance that certain evidence is not good enough could go a long way.
As more AI companies are started, they could feel soft pressure to do similar self-regulation, and refusing to do so is off-putting to potential employees, investors, etc.
Eventually, similar principles could be incorporated into various government regulations and enforceable treaties.
Governments could monitor for dangerous projects using regulation and even overseas operations. E.g., today the US monitors (without permission) for various signs that other states might be developing nuclear weapons, and might try to stop such development with methods ranging from threats of sanctions to cyberwarfare or even military attacks. It could do something similar for any AI development projects that are using huge amounts of compute and haven’t volunteered information about whether they’re meeting standards.

Alignment research is prosocial and great

Most people reading this can’t go and become groundbreaking researchers on AI alignment. But they can contribute to a general sense that the people who can do this (mostly) should.

Today, my sense is that most “science” jobs are pretty prestigious, and seen as good for society. I have pretty mixed feelings about this:

I think science has been good for humanity historically.
But I worry that as technology becomes more and more powerful, there’s a growing risk of a catastrophe (particularly via AI or bioweapons) that wipes out all the progress to date and then some. (I've written that the historical trend to date arguably fits something like "Declining everyday violence, offset by bigger and bigger rare catastrophes.") I think our current era would be a nice time to adopt an attitude of “proceed with caution” rather than “full speed ahead.”
I resonate with Toby Ord’s comment (in The Precipice), “humanity is akin to an adolescent, with rapidly developing physical abilities, lagging wisdom and self-control, little thought for its longterm future and an unhealthy appetite for risk.”

I wish there were more effort, generally, to distinguish between especially dangerous science and especially beneficial science. AI alignment seems squarely in the latter category.

I’d be especially excited for people to spread messages that give a sense of the specifics of different AI alignment research paths, how they might help or fail, and what’s scientifically/intellectually interesting (not just useful) about them.

The main relevant piece in this series is High-level hopes for AI alignment, which distills a longer piece (How might we align transformative AI if it’s developed very soon?) that I posted on the Alignment Forum.

There are a number (hopefully growing) of other careers that I consider especially valuable, which I'll discuss in my next post on this topic.

It might be important for companies (and other institutions) to act in unusual ways

In Racing through a Minefield: the AI Deployment Problem, I wrote:

A lot of the most helpful actions might be “out of the ordinary.” When racing through a minefield, I hope key actors will:

Put more effort into alignment, threat assessment, and security than is required by commercial incentives;
Consider measures for avoiding races and global monitoring that could be very unusual, even unprecedented.
Do all of this in the possible presence of ambiguous, confusing information about the risks.

It always makes me sweat when I’m talking to someone from an AI company and they seem to think that commercial success and benefiting humanity are roughly the same goal/idea.

(To be clear, I don't think an AI project's only goal should be to avoid the risk of misaligned AI. I've given this risk a central place in this piece partly because I think it's especially at risk of being too quickly dismissed - but I don't think it's the only major risk. I think AI projects need to strike a tricky balance between the caution and competition frames, and consider a number of issues beyond the risk of misalignment. But I think it's a pretty robust point that they need to be ready to do unusual things rather than just following commercial incentives.)

I’m nervous about a world in which:

Most people stick with paradigms they know - a company should focus on shareholder value, a government should focus on its own citizens (rather than global catastrophic risks), etc.
As the pace of progress accelerates, we’re sitting here with all kinds of laws, norms and institutions that aren’t designed for the problems we’re facing - and can’t adapt in time. A good example would be the way governance works for a standard company: it’s legally and structurally obligated to be entirely focused on benefiting its shareholders, rather than humanity as a whole. (There are alternative ways of setting up a company without these problems!^[5]

At a minimum (as I argued previously), I think AI companies should be making sure they have whatever unusual governance setups they need in order to prioritize benefits to humanity - not returns to shareholders - when the stakes get high. I think we’d see more of this if more people believed something like: “It might be important for companies (and other institutions) to act in unusual ways.”

We’re not ready for this

If we’re in the most important century, there’s likely to be a vast set of potential challenges ahead of us, most of which have gotten very little attention. (More here: Transformative AI issues (not just misalignment): an overview)

If it were possible to slow everything down, by default I’d think we should. Barring that, I’d at least like to see people generally approaching the topic of AI with a general attitude along the lines of “We’re dealing with something really big here, and we should be trying really hard to be careful and humble and thoughtful” (as opposed to something like “The science is so interesting, let’s go for it” or “This is awesome, we’re gonna get rich” or “Whatever, who cares”).

I’ll re-excerpt this table from an earlier piece:

Situation	Appropriate reaction (IMO)
"This could be a billion-dollar company!"	"Woohoo, let's GO for it!"
"This could be the most important century!"	"... Oh ... wow ... I don't know what to say and I somewhat want to vomit ... I have to sit down and think about this one."

I’m not at all sure about this, but one potential way to spread this message might be to communicate, with as much scientific realism, detail and believability as possible, about what the world might look like after explosive scientific and technological advancement brought on by AI (for example, a world with digital people). I think the enormous unfamiliarity of some of the issues such a world might face - and the vast possibilities for utopia or dystopia - might encourage an attitude of not wanting to rush forward.

How to spread messages like these?

I’ve tried to write a series that explains the key issues to careful readers, hopefully better equipping them to spread helpful messages. From here, individual communicators need to think about the audiences they know and the mediums they use (Twitter? Facebook? Essays/newsletters/blog posts? Video? In-person conversation?) and what will be effective with those audiences and mediums.

The main guidelines I want to advocate:

Err toward sustained, repeated, relationship-based communication as opposed to prioritizing “viral blasts” (unless you are so good at the latter that you feel excited to spread the pretty subtle ideas in this piece that way!)
Aim high: try for the difficult goal of “My audience walks away really understanding key points” rather than the easier goal of “My audience has hit the ‘like’ button for a sort of related idea.”
A consistent piece of feedback I’ve gotten on my writing is that making things as concrete as possible is helpful - so giving real-world examples of problems analogous to the ones we’re worried about, or simple analogies that are easy to imagine and remember, could be key. But it’s important to choose these carefully so that the key dynamics aren’t lost.

Footnotes

Killer Apps and Technology Roulette are interesting pieces trying to sell policymakers on the idea that “superiority is not synonymous with security.” ↩
When I imagine what the world would look like without any of the efforts to “raise awareness,” I picture a world with close to zero awareness of - or community around - major risks from transformative AI. While this world might also have more time left before dangerous AI is developed, on balance this seems worse. A future piece will elaborate on the many ways I think a decent-sized community can help reduce risks. ↩
I do think “AI could be a huge deal, and soon” is a very important point that somewhat serves as a prerequisite for understanding this topic and doing helpful work on it, and I wanted to make this idea more understandable and credible to a number of people - as well as to create more opportunities to get critical feedback and learn what I was getting wrong.
But I was nervous about the issues noted in this section. With that in mind, I did the following things:
- The title, “most important century,” emphasizes a time frame that I expect to be less exciting/motivating for the sorts of people I’m most worried about (compared to the sorts of people I most wanted to draw in).
- I tried to persistently and centrally raise concerns about misaligned AI (raising it in two pieces, including one (guest piece) devoted to it, before I started discussing how soon transformative AI might be developed), and extensively discussed the problems of overemphasizing “competition” relative to “caution.”
- I ended the series with a piece arguing against being too “action-oriented.”
- I stuck to “passive” rather than “active” promotion of the series, e.g., I accepted podcast invitations but didn’t seek them out. I figured that people with proactive interest would be more likely to give in-depth, attentive treatments rather than low-resolution, oversimplified ones.
I don’t claim to be sure I got all the tradeoffs right. ↩
There are some papers arguing that AI systems do things something like this (e.g., see the “Challenges” section of this post), but I think the dynamic is overall pretty far from what I’m most worried about. ↩
E.g., public benefit corporation ↩

Henry HowardJan 29 202322

I don't like this post and I don't think it should pinned to the forum front page.

A few reasons:

The general message of: "go and spread this message, this is the way to do it" is too self-assured, and unquestioning. It appears cultish. It's off-putting to have this as the first thing that forum visitors will see.
The thesis of the post is that a useful thing for everyone to do is to spread a message about AI safety, but it's not clear what messages you think should be being spread. The only two I could see are "relate it to Skynet" and "even if AI looks safe it might not be".
Too many prerequisites: this post refers to five or ten others posts as a "this concept is properly explained here" thing. Many of these posts reference further posts. This is a red flag to me of poor writing and/or poor ideas. Either a) your ideas are so complex that they do indeed require many thousands of words to explain (in which case, fine), or b) they're not that complex, just aren't being communicated well or c) bad ideas are being obscured in a tower of readings that gatekeep the critics away. I'd like to see the actual ideas you're referring to expressed clearly, instead of referring to other posts.
Having this pinned to the front page further reinforces the disproportionate focus that AI Safety gets on the forum

NunoSempereJan 30 202318

Personally an argument I would find more compelling is to note that the OP doesn't answer comments, making the value of discussion lower and it less interesting for a public forum. Also there is already a newsletter for cold takes that people can subscribe to.

Holden KarnofskyMar 21 20234

Noting that I’m now going back through posts responding to comments, after putting off doing so for months - I generally find it easier to do this in bulk to avoid being distracted from my core priorities, though this time I think I put it off longer than I should’ve.

It is generally true that my participation in comments is extremely sporadic/sparse, and folks should factor that into curation decisions.

sqgrovesJan 29 202316

These don't seem very compelling to me.

This argument proves too much. The same could be said of "go and donate your money, this (list of charities we think are most effective) is the way to do it".
My takeaway was that messages which could be spread include: "we should worry about conflict between misaligned AI and all humans", "AIs could behave deceptively, so evidence of safety might be misleading, "AI projects should establish and demonstrate safety (and potentially comply with safety standards) before deploying powerful systems", "alignment research is prosocial and great" and "we’re not ready for this". (I excluded "it might be important for companies and other institutions to act in unusual ways", because I agree this doesn't seem like a straightforward message to spread).
The answer is probably (a).
"Disproportionate" seems like it boils down to an object-level disagreement about relative cause prioritisation between AI safety and other causes.

freedomandutilityJan 29 20239

I like the framing "bad ideas are being obscured in a tower of readings that gatekeep the critics away" and I think EA is guilty of this sometimes in other areas too.

Holden KarnofskyMar 21 20236

Just noting that many of the “this concept is properly explained elsewhere” links are also accompanied by expandable boxes that you can click to expand for the gist. I do think that understanding where I’m coming from in this piece requires a bunch of background, but I’ve tried to make it as easy on readers as I could, e.g. explaining each concept in brief and providing a link if the brief explanation isn’t clear enough or doesn’t address particular objections.

Lauren MariaJan 29 20234

I agree. I’m curious what the process is for deciding what gets pinned to the front page. Does anyone know?

LizkaJan 30 202310

Hi! The process for curation is outlined here. In short, some people can suggest curation, and I currently make the final calls.

You can also see a list of other posts that have been curated (you can get to the list by clicking on the star next to a curated post's title).

Lauren MariaJan 30 2023-1

Oh, I see! Thanks, that's helpful.

LizkaJan 28 202315

Thanks for writing this! I'm curating it.

Some things I really appreciate about the post:

The claim (paraphrased), "it is pretty easy to get AI safety messaging wrong, but there are some useful things to communicate about AI safety" seems important (and right — I've also seen examples of people accidentally spreading the idea that "AI will be powerful"). I also think lots of people in the EA community should hear it — a good number of people are in fact working on "spreading the ideas of AI safety" (see a related topic page).
It's very nice to have more content on things that ~everyone can help with.
1. "practically everyone can help with spreading messages at least some, via things like talking to friends; writing explanations of your own that will appeal to particular people; and, yes, posting to Facebook and Twitter and all of that. [...] I’d guess it can be a big deal: many extremely important AI-related ideas are understood by vanishingly small numbers of people, and a bit more awareness could snowball. Especially because these topics often feel too “weird” for people to feel comfortable talking about them! Engaging in credible, reasonable ways could contribute to an overall background sense that it’s OK to take these ideas seriously."
The lists of kinds of messages that are risky/helpful are helpful:
1. Risky (presumably not an exhaustive list!):
  1. messages that generically emphasize the importance and potential imminence of powerful AI systems
  2. messages that emphasize that AI could be risky/dangerous to the world, without much effort to fill in how, or with an emphasis on easy-to-understand risks (where one of the risks is, "If people have a bad model of how and why AI could be risky/dangerous (missing key risks and difficulties), they might be too quick to later say things like “Oh, turns out this danger is less bad than I thought, let’s go full speed ahead!”")
2. Helpful + right (This list is presumably also not exhaustive. I should also say that I'm least optimistic about iii (sort of) and v.)
  1. [S] We should worry about conflict between misaligned AI and all humans
  2. [S] AIs could behave deceptively, so “evidence of safety” might be misleading
  3. [S] AI projects should establish and demonstrate safety (and potentially comply with safety standards) before deploying powerful systems
  4. [S] Alignment research is prosocial and great
  5. [S] It might be important for companies (and other institutions) to act in unusual ways
  6. [S] We’re not ready for this

One question/disagreement/clarification I have about the statement, "I’m not excited about blasting around hyper-simplified messages."

The word "simplified" is a bit vague; I think I disagree with some interpretations of the sentence. I agree that "it’s generally not good enough to spread the most broad/relatable/easy-to-agree-to version of each key idea," but I think in some cases, "simplifying" could be really useful for spreading more accurate messages. In particular, "simplifying" could mean something like "dumbing down somewhat indiscriminately" — which is bad/risky — or it could mean something like "shortening and focusing on the key points, making technical points accessible to a more general audience, etc." — something like distillation. The latter approach seems really useful here, in part because it might help overcome a big problem in AI safety messaging: that a lot of the key points about risk are difficult to understand, and that important texts are technical. This means that it's easy to be shown cool demos of new AI systems, but not as easy to understand the arguments that explain why progress in AI might be dangerous. (So people trying to make the case in favor of safety might resort to deferring to experts, get the messages wrong in ways that make the listener unnecessarily skeptical of the overall case, etc.)
(More minor: I also think that the word "blast" has negative connotations which make it harder to correctly engage with the sentence. I think you mean "I'm not excited about sharing hyper-simplified messages in a way that reaches a ~random-but-large subset of people." I think I agree — it seems better to target a particular audience — but the way it's currently stated makes it harder to disagree; it's harder to say, "no, I think we should in fact blast some messages" than it is to say, "I think there are some messages that appeal to a very wide range of audiences," or to say "I think there are some messages we should promote extensively.")

(I should say that the opinions I'm sharing here are mine, not CEA's. I also think a lot of my opinions here are not very resilient.)

Marc WongFeb 10 20231

Whether it’s a knife, a car, social media, or artificial intelligence, technology is power.

There’s no reason why we shouldn’t use the familiar and mature car safety culture and practices to improve AI (and other technologies’) safety.

This means user training (driver licenses), built-in safety features (eg. seat belts, air bags), frequent public service announcements, independent and rigorous safety and reliability reviews, rules and regulations (traffic rules), enforcement (traffic police), insurance, development and testing in controlled environments, guards against deliberate or accidental misuse, guards against (large) advances with (large) uncertainties, and promoting safe attitudes and mutual accountability (eg. reject road rage).

If we can’t educate the public, media, technologists, and politicians in simple, engaging terms, and inspire them to take action, then we’ll always be at risk.

Technology is Power: Raising Awareness Of Technological Risks

ArdenlkJan 27 20236

Nice post. One thought on this - you wrote:

"I’d be especially excited for people to spread messages that help others understand - at a mechanistic level - how and why AI systems could end up with dangerous goals of their own, deceptive behavior, etc. I worry that by default, the concern sounds like lazy anthropomorphism (thinking of AIs just like humans)."

I agree that this seems good for avoiding the anthropomorphism (in perception and in one's own thought!) but I think it'll be important to emphasise when doing this that these are conceivable ways and ultimately possible examples rather than the whole risk-case. Why? People might otherwise think that they have solved the problem when they've ruled out or fixed that particular problematic mechanism, when really they haven't. Or when the more specific mechanistic descriptions probably end up wrong in some way, the whole case might be dismissed - when the argument for risk didn't ultimately depend on those particulars.

(this only applies if you are pretty unconfident confident in the particular mechanisms that will be risky vs. safe)

[written in my personal capacity]

Holden KarnofskyMar 18 20232

Agreed!

PeterSlatteryJan 27 20235

Thanks for sharing these suggestions, they are very helpful.

A quick comment:

I am also excited by the idea of spreading these messages and doing it well. I suspect that outside of my EA contacts most people in my network have never received a single message that made them aware of the more serious risks of AI, or know of a good source to learn about them. Given that awareness of opportunity is a prerequisite for desirable reactions (e.g, changes in career choice or personal advocacy), this seems very suboptimal.

I recently attempted to spread some AI risk related messages on LinkedIn. (Everyone on LinkedIn sees many posts about ChatGPT, so I no-longer assign much probability to the chance that someone who reads a post about AI Safety will become aware of the potential of AI and decide to speed up capabilities research instead. )

When doing the posts I attempt to find a 'hook' that gets attention (e.g., I link and discuss an interesting video or outline some ways to use GPT3 - see posts linked below), share some personal views, then segue into a nudge to read a good source of AI risk related information.

My hope is doing this occasionally, over time, can increase awareness of, and engagement with good sources of information on AI risk, and have positive flow on effects etc.

What is the likely alternative?

If I don't do posts like these it seems very unlikely that the people who read my posts will find out about AI risk for an extended period. I have yet to see a post on LinkedIn which mentions, or links to an 'EA perspective' on AI risk. Rarely do I see anything negative about AI - if so, such posts are focused on the short terms risks related to unemployment etc.

However, I find it hard (at least within the time I assign to writing content for LinkedIn) to communicate complex ideas while also engaging people on social media, and I wonder if I am simplifying things too much in my content.

With that in mind, I'd appreciate feedback from anyone is interested. This could be on my thoughts above, or on my two posts so far (see the two comments with links that I will add below). To leave feedback on the posts, please vote agree (if it seems ok/good to post like this) or disagree (if you think it is better to not do), or reply to the relevant comment. Thanks!

Akash KulgodJan 30 20235

No concrete useful feedback, just a note that I thought both posts were artfully tailored to your purpose and medium, nicely done!

PeterSlatteryJan 31 20232

Thanks, Akash, I really appreciate that you reviewed them and shared that!

Post 1

PeterSlatteryJan 27 20234

Post 2

freedomandutilityJan 29 20234

Agree that in isolation, spreading the ideas of

(a) AI could be really powerful and important within our lifetimes

and

(b) Building AI too quickly/ incautiously could be dangerous

Could backfire.

But I think just removing the "incautiously" element, and focusing on the "too quickly element", and adding

Should be pretty effective in preventing people from thinking that we should race to creating AGI.

So essentially, AI could be really powerful, building it too quickly could be dangerous, we should fund lots of AI Safety research before its invented. I think adding more fidelity / detail / nuance would be net negative, given that they would slow down the spread of the message.

Also, I think we shouldn't take things OpenAI and DeepMind say at face value, and bear in mind the corrupting influence of the profit motive, motivated reasoning and 'safetywashing'.

Just because someone says they're making something that could make them billions of dollars because they think it will benefit humanity, doesn't mean they're actually doing it to benefit humanity. What they claim is a race to make safe AGI is probably significantly motivated by a race to make lots of money.

Effective Altruism Forum
EA Forum