R

RobBensinger

@ Machine Intelligence Research Institute
8292 karmaJoined Berkeley, CA, USA

Sequences
1

Late 2021 MIRI Conversations

Comments
582

Topic contributions
2

Leopold's scenario requires that the USG come to deeply understand all the perils and details of AGI and ASI (since they otherwise don't have a hope of building and aligning a superintelligence), but then needs to choose to gamble its hegemony, its very existence, and the lives of all its citizens on a half-baked mad science initiative, when it could simply work with its allies to block the tech's development and maintain the status quo at minimal risk.

Success in this scenario requires a weird combination of USG prescience with self-destructiveness: enough foresight to see what's coming, but paired with a weird compulsion to race to build the very thing that puts its existence at risk, when it would potentially be vastly easier to spearhead an international alliance to prohibit this technology.

Three high-level reasons I think Leopold's plan looks a lot less workable:

  1. It requires major scientific breakthroughs to occur on a very short time horizon, including unknown breakthroughs that will manifest to solve problems we don't understand or know about today.
  2. These breakthroughs need to come in a field that has not been particularly productive or fast in the past. (Indeed, forecasters have been surprised by how slowly safety/robustness/etc. have progressed in recent years, and simultaneously surprised by the breakneck speed of capabilities.)
  3. It requires extremely precise and correct behavior by a giant government bureaucracy that includes many staff who won't be the best and brightest in the field -- inevitably, many technical and nontechnical people in the bureaucracy will have wrong beliefs about AGI and about alignment.

The "extremely precise and correct behavior" part means that we're effectively hoping to be handed an excellent bureaucracy that will rapidly and competently solve a thirty-digit combination lock requiring the invention of multiple new fields and the solving of a variety of thorny and poorly-understood technical problems -- all in a handful of years.

(It also requires that various empirical predictions all pan out. E.g., Leopold could do everything right and get the USG fully on board and get the USG doing literally everything right by his lights -- and then the plan ends up destroying the world rather than saving it because it turned out ASI was a lot more compute-efficient to train than he expected, resulting in the USG being unable to monopolize the tech and unable to achieve a sufficiently long lead time.)

My proposal doesn't require qualitatively that kind of success. It requires governments to coordinate on banning things. Plausibly, it requires governments to overreact to a weird, scary, and publicly controversial new tech to some degree, since it's unlikely that governments will exactly hit the target we want. This is not a particularly weird ask; governments ban things (and coordinate or copy-paste each other's laws) all the time, in far less dangerous and fraught areas than AGI. This is "trying to get the international order to lean hard in a particular direction on a yes-or-no question where there's already a lot of energy behind choosing 'no'", not "solving a long list of hard science and engineering problems in a matter of months and weeks and getting a bureaucracy to output the correct long string of digits to nail down all the correct technical solutions and all the correct processes to find those solutions".

The CCP's current appetite for AGI seems remarkably small, and I expect them to be more worried that an AGI race would leave them in the dust (and/or put their regime at risk, and/or put their lives at risk), than excited about the opportunity such a race provides. Governments around the world currently, to the best of my knowledge, are nowhere near the cutting edge in ML. From my perspective, Leopold is imagining a future problem into being ("all of this changes") and then trying to find a galaxy-brained incredibly complex and assumption-laden way to wriggle out of this imagined future dilemma, when the far easier and less risky path would be to not have the world powers race in the first place, have them recognize that this technology is lethally dangerous (something the USG chain of command, at least, would need to fully internalize on Leopold's plan too), and have them block private labs from sending us over the precipice (again, something Leopold assumes will happen) while not choosing to take on the risk of destroying themselves (nor permitting other world powers to unilaterally impose that risk).

I think it's still good for some people to work on alignment research. The future is hard to predict, and we can't totally rule out a string of technical breakthroughs, and the overall option space looks gloomy enough (at least from my perspective) that we should be pursuing multiple options in parallel rather than putting all our eggs in one basket.

That said, I think "alignment research pans out to the level of letting us safely wield vastly superhuman AGI in the near future" is sufficiently unlikely that we definitely shouldn't be predicating our plans on that working out. AFAICT Leopold's proposal is that we just lay down and die in the worlds where we can't align vastly superhuman AI, in exchange for doing better in the worlds where we can align it; that seems extremely reckless and backwards to me, throwing away higher-probability success worlds in exchange for more niche and unlikely success worlds.

I also think alignment researchers thus far, as a group, have mainly had the effect of shortening timelines. I want alignment research to happen, but not at the cost of reducing our hope in the worlds where alignment doesn't pan out, and thus far a lot of work labeled "alignment" has either seemed to accelerate the field toward AGI, or seemed to provide justification/cover for increasing the heat and competitiveness of the field, which seems pretty counterproductive to me.

Fair! That's at least a super nonstandard example of an "opinion poll".

There’s a knock against prediction markets, here, too. A Metaculus forecast, in March of 2022 (the end of the period when one could make forecasts on this question), gave a 1.3% chance of FTX making any default on customer funds over the year. The probability that the Metaculus forecasters would have put on the claim that FTX would default on very large numbers of customer funds, as a result of misconduct, would presumably have been lower.

Metaculus isn't a prediction market; it's just an opinion poll of people who use the Metaculus website.

Since writing that post, though, I now lean more towards thinking that someone should “own” managing the movement, and that that should be the Centre for Effective Altruism.

I agree with this. Failing that, I feel strongly that CEA should change its name. There are costs to having a leader / manager / "coordinator-in-chief", and costs to not having such an entity; but the worst of both worlds is to have ambiguity about whether a person or org is filling that role. Then you end up with situations like "a bunch of EAs sit on their hands because they expect someone else to respond, but no one actually takes the wheel", or "an org gets the power of perceived leadership, but has limited accountability because it's left itself a lot of plausible deniability about exactly how much of a leader it is".

Update Apr. 15:  I talked to a CEA employee and got some more context on why CEA hasn't done an SBF investigation and postmortem. In addition to the 'this might be really difficult and it might not be very useful' concern, they mentioned that the Charity Commission investigation into EV UK is still ongoing a year and a half later. (Google suggests that statutory inquiries by the Charity Commission take an average of 1.2 years to complete, so the super long wait here is sadly normal.)

Although the Commission has said "there is no indication of wrongdoing by the trustees at this time", and the risk of anything crazy happening is lower now than it was a year and a half ago, I gather that it's still at least possible that the Commission could take some drastic action like "we think EV did bad stuff, so we're going to take over the legal entity that includes the UK components of CEA, 80K, GWWC, GovAI, etc.", which may make it harder for CEA to usefully hold the steering wheel on an SBF investigation at this stage.

Example scenario: CEA tries to write up some lessons learned from the SBF thing, with an EA audience in mind; EAs tend to have unusually high standards, and a CEA staffer writes a comment that assumes this context, without running the comment by lawyers because it seemed innocent enough; because of those high standards, the Charity Commission misreads the CEA employee as implying a way worse thing happened than is actually the case.

This particular scenario may not be a big risk, but the sum of the risk of all possible scenarios like that (including scenarios that might not currently be on their radar) seems non-negligible to the CEA person I spoke to, even though they don't think there's any info out there that should rationally cause the Charity Commission to do anything wild here. And trying to do serious public reflection or soul-searching while also carefully nitpicking every sentence for possible ways the Charity Commission could misinterpret something, doesn't seem like an optimal set-up for deep, authentic, and productive soul-searching.

The CEA employee said that they thought this is one reason (but not the only reason) EV is unlikely to run a postmortem of this kind.

 

My initial thoughts on all this: This is very useful info! I had no idea the Charity Commission investigation was still ongoing, and if there are significant worries about that, that does indeed help make CEA and EV’s actions over the last year feel a lot less weird-and-mysterious to me.

I’m not sure I agree with CEA or EV’s choices here, but I no longer feel like there’s a mystery to be explained here; this seems like a place where reasonable people can easily disagree about what the right strategy is. I don't expect the Charity Commission to in fact take over those organizations, since as far as I know there's no reason to do that, but I can see how this would make it harder for CEA to do a soul-searching postmortem.

I do suspect that EV and/or CEA may be underestimating the costs of silence here. I could imagine a frog-boiling problem arising here, where it made sense to delay a postmortem for a few months based on a relatively small risk of disaster (and a hope that the Charity Commission investigation in this case might turn out to be brief), but it may not make sense to continue to delay in this situation for years on end. Both options are risky; I suspect the risks of inaction and silence may be getting systematically under-weighted here. (But it’s hard to be confident when I don’t know the specifics of how these decisions are being made.)

 

I ran the above by Oliver Habryka, who said:

“I talked to a CEA employee and got some more context on why CEA hasn't done an SBF investigation and postmortem.”

Seems like it wouldn't be too hard for them to just advocate for someone else doing it?

Or to just have whoever is leading the investigation leave the organization.

In general it seems to me that an investigation is probably best done in a relatively independent vehicle anyways, for many reasons.

“My thoughts on all this: This is very useful info! I had no idea the Charity Commission investigation was still ongoing, and that does indeed help make CEA and EV’s actions over the last year feel a lot less weird-and-mysterious to me.”

Agree that this is an important component (and a major component for my models).


I have some information suggesting that maybe Oliver and/or the CEA employee's account is wrong, or missing part of the story?? But I'm confused about the details, so I'll look into things more and post an update here if I learn more.

I feel like "people who worked with Sam told people about specific instances of quite serious dishonesty they had personally observed" is being classed as "rumour" here, which whilst not strictly inaccurate, is misleading, because it is a very atypical case relative to the image the word "rumour" conjures.

I agree with this.

[...] I feel like we still want to know if any one in leadership argued "oh, yeah, Sam might well be dodgy, but the expected value of publicly backing him is high because of the upside". That's a signal someone is a bad leader in my view, which is useful knowledge going forward.

I don't really agree with this. Everyone has some probability of turning out to be dodgy; it matters exactly how strong the available evidence was. "This EA leader writes people off immediately when they have even a tiny probability of being untrustworthy" would be a negative update about the person's decision-making too!

"Just focus on the arguments" isn't a decision-making algorithm, but I think informal processes like "just talk about it and individually do what makes sense" perform better than rigid algorithms in cases like this.

If we want something more formal, I tend to prefer approaches like "delegate the question to someone trustworthy who can spend a bunch of time carefully weighing the arguments" or "subsidize a prediction market to resolve the question" over "just run an opinion poll and do whatever the majority of people-who-see-the-poll vote for, without checking how informed or wise the respondents are".

Knowing what people think is useful, especially if it's a non-anonymous poll aimed at sparking conversations, questions, etc. (One thing that might help here is to include a field for people to leave a brief explanation of their vote, if the polling software allows for it.)

Anonymous polls are a bit trickier, since random people on the Internet can easily brigade such a poll. And I wouldn't want to assume that something's a good idea just because most EAs agree with it; I'd rather focus on the arguments for and against.

Load more