On Deference and Yudkowsky's AI Risk Estimates

bgarfinkel

Note: I mostly wrote this post after Eliezer Yudkowsky’s “Death with Dignity” essay appeared on LessWrong. Since then, Jotto has written a post that overlaps a bit with this one, which sparked an extended discussion in the comments. You may want to look at that discussion as well. See also, here, for another relevant discussion thread.

EDIT: See here for some post-discussion reflections on what I think this post got right and wrong.

Introduction

Most people, when forming their own views on risks from misaligned AI, have some inclination to defer to others who they respect or think of as experts.

This is a reasonable thing to do, especially if you don’t yet know much about AI or haven’t yet spent much time scrutinizing the arguments. If someone you respect has spent years thinking about the subject, and believes the risk of catastrophe is very high, then you probably should take that information into account when forming your own views.

It’s understandable, then, if Eliezer Yudkowsky’s recent writing on AI risk helps to really freak some people out. Yudkowsky has probably spent more time thinking about AI risk than anyone else. Along with Nick Bostrom, he is the person most responsible for developing and popularizing these concerns. Yudkowsky has now begun to publicly express the view that misaligned AI has a virtually 100% chance of killing everyone on Earth - such that all we can hope to do is “die with dignity.”

The purpose of this post is, simply, to argue that people should be wary of deferring too much to Eliezer Yudkowsky, specifically, when it comes to estimating AI risk.^[1] In particular, I think, they shouldn’t defer to him more than they would defer to anyone else who is smart and has spent a large amount of time thinking about AI risk.^[2]

The post highlights what I regard as some negative aspects of Yudkowsky’s track record, when it comes to technological risk forecasting. I think these examples suggest that (a) his track record is at best fairly mixed and (b) he has some tendency toward expressing dramatic views with excessive confidence. As a result, I don’t personally see a strong justification for giving his current confident and dramatic views about AI risk a great deal of weight.^[3]

I agree it’s highly worthwhile to read and reflect on Yudkowsky’s arguments. I also agree that potential risks from misaligned AI deserve extremely serious attention - and are even, plausibly, more deserving of attention than any other existential risk.^[4] I also think it's important to note that many experts beyond Yudkowsky are very concerned about risks from misaligned AI. I just don’t think people should infer too much from the fact that Yudkowsky, specifically, believes we’re doomed.

Why write this post?

Before diving in, it may be worth saying a little more about why I hope this post might be useful. (Feel free to skip ahead if you're not interested in this section.)

In brief, it matters what the existential risk community believes about the risk from misaligned AI. I think that excessively high credences in doom can lead to:

poor prioritization decisions (underprioritizing other risks, including other potential existential risks from AI)
poor community health (anxiety and alienation)
poor reputation (seeming irrational, cultish, or potentially even threatening), which in turn can lead to poor recruitment or retention of people working on important problems^[5]

My own impression is that, although it's sensible to take potential risks from misaligned AI very seriously, a decent number of people are now more freaked out than they need to be. And I think that excessive deference to some highly visible intellectuals in this space, like Yudkowsky, may be playing an important role - either directly or through deference cascades.^[6] I'm especially concerned about new community members, who may be particularly inclined to defer to well-known figures and who may have particularly limited awareness of the diversity of views in this space. I've recently encountered some anecdotes I found worrisome.

Nothing I write in this post implies that people shouldn't freak out, of course, since I'm mostly not engaging with the substance of the relevant arguments (although I have done this elsewhere, for instance here, here, and here). If people are going to freak out about AI risk, then I at least want to help make sure that they’re freaking out for sufficiently good reasons.

Yudkowsky’s track record: some cherry-picked examples

Here, I’ve collected a number of examples of Yudkowsky making (in my view) dramatic and overconfident predictions concerning risks from technology.

Note that this isn’t an attempt to provide a balanced overview of Yudkowsky’s technological predictions over the years. I’m specifically highlighting a number of predictions that I think are underappreciated and suggest a particular kind of bias.

Doing a more comprehensive overview, which doesn’t involve specifically selecting poor predictions, would surely give a more positive impression. Hopefully this biased sample is meaningful enough, however, to support the claim that Yudkowsky’s track record is at least pretty mixed.^[7]

Also, a quick caveat: Unfortunately, but understandably, Yudkowsky didn’t have time review this post and correct any inaccuracies. In various places, I’m summarizing or giving impressions of lengthy pieces I haven’t fully read, or haven't fully read in well more than year, so there's a decent chance that I’ve accidentally mischaracterized some of his views or arguments. Concretely: I think there’s something on the order of a 50% chance I’ll ultimately feel I should correct something below.

Fairly clearcut examples

1. Predicting near-term extinction from nanotech

At least up until 1999, admittedly when he was still only about 20 years old, Yudkowsky argued that transformative nanotechnology would probably emerge suddenly and soon (“no later than 2010”) and result in human extinction by default. My understanding is that this viewpoint was a substantial part of the justification for founding the institute that would become MIRI; the institute was initially focused on building AGI, since developing aligned superintelligence quickly enough was understood to be the only way to manage nanotech risk:

On the nanotechnology side, we possess machines capable of producing arbitrary DNA sequences, and we know how to turn arbitrary DNA sequences into arbitrary proteins (6). We have machines - Atomic Force Probes - that can put single atoms anywhere we like, and which have recently [1999] been demonstrated to be capable of forming atomic bonds. Hundredth-nanometer precision positioning, atomic-scale tweezers... the news just keeps on piling up…. If we had a time machine, 100K of information from the future could specify a protein that built a device that would give us nanotechnology overnight….

If you project on a graph the minimum size of the materials we can manipulate, it reaches the atomic level - nanotechnology - in I forget how many years (the page vanished), but I think around 2035. This, of course, was before the time of the Scanning Tunnelling Microscope and "IBM" spelled out in xenon atoms. For that matter, we now have the artificial atom ("You can make any kind of artificial atom - long, thin atoms and big, round atoms."), which has in a sense obsoleted merely molecular nanotechnology - the surest sign that nanotech is just around the corner. I believe Drexler is now giving the ballpark figure of 2013. My own guess would be no later than 2010…

Above all, I would really, really like the Singularity to arrive before nanotechnology, given the virtual certainty of deliberate misuse - misuse of a purely material (and thus, amoral) ultratechnology, one powerful enough to destroy the planet. We cannot just sit back and wait….

Mitchell Porter calls it "The race between superweapons and superintelligence." Human civilization will continue to change until we either create superintelligence, or wipe ourselves out. Those are the two stable states, the two "attractors". It doesn't matter how long it takes, or how many cycles of nanowar-and-regrowth occur before Transcendence or final extinction. If the system keeps changing, over a thousand years, or a million years, or a billion years, it will eventually wind up in one attractor or the other. But my best guess is that the issue will be settled now.”

I should, once again, emphasize that Yudkowsky was around twenty when he did the final updates on this essay. In that sense, it might be unfair to bring this very old example up.

Nonetheless, I do think this case can be treated as informative, since: the belief was so analogous to his current belief about AI (a high outlier credence in near-term doom from an emerging technology), since he had thought a lot about the subject and was already highly engaged in the relevant intellectual community, since it's not clear when he dropped the belief, and since twenty isn't (in my view) actually all that young. I do know a lot of people in their early twenties; I think their current work and styles of thought are likely to be predictive of their work and styles of thought in the future, even though I do of course expect the quality to go up over time.

2. Predicting that his team had a substantial chance of building AGI before 2010

In 2001, and possibly later, Yudkowsky apparently believed that his small team would be able to develop a “final stage AI” that would “reach transhumanity sometime between 2005 and 2020, probably around 2008 or 2010.”

In the first half of the 2000s, he produced a fair amount of technical and conceptual work related to this goal. It hasn't ultimately had much clear usefulness for AI development, and, partly on the basis, my impression is that it has not held up well - but that he was very confident in the value of this work at the time.

The key points here are that:

Yudkowsky has previously held short AI timeline views that turned out to be wrong
Yudkowsky has previously held really confident inside views about the path to AGI that (at least seemingly) turned out to be wrong
More generally, Yudkowsky may have a track record of overestimating or overstating the quality of his insights into AI

Flare

Although I haven’t evaluated the work, my impression is that Yudkowsky was a key part of a Singularity Institute effort to develop a new programming language to use to create “seed AI.” He (or whoever was writing the description of the project) seems to have been substantially overconfident about its usefulness. From the section of the documentation titled “Foreword: Earth Needs Flare” (2001):

A new programming language has to be really good to survive. A new language needs to represent a quantum leap just to be in the game. Well, we're going to be up-front about this: Flare is really good. There are concepts in Flare that have never been seen before. We expect to be able to solve problems in Flare that cannot realistically be solved in any other language. We expect that people who learn to read Flare will think about programming differently and solve problems in new ways, even if they never write a single line of Flare….Flare was created under the auspices of the Singularity Institute for Artificial Intelligence, an organization created with the mission of building a computer program far before its time - a true Artificial Intelligence. Flare, the programming language they asked for to help achieve that goal, is not that far out of time, but it's still a special language.”

Coding a Transhuman AI

I haven’t read it, to my discredit, but “Coding a Transhuman AI 2.2” is another piece of technical writing by Yudkowsky that one could look at. The document is described as “the first serious attempt to design an AI which has the potential to become smarter than human,” and aims to “describe the principles, paradigms, cognitive architecture, and cognitive components needed to build a complete mind possessed of general intelligence.”

From a skim, I suspect there’s a good chance it hasn’t held up well - since I’m not aware of any promising later work that builds on it and since it doesn’t seem to have been written with the ML paradigm in mind - but can’t currently give an informed take.

Levels of Organization in General Intelligence

A later piece of work which I also haven’t properly read is “Levels of Organization in General Intelligence.” At least by 2005, going off of Yudkowsky’s post “So You Want to be a Seed AI Programmer,” it seems like he thought a variation of the framework in this paper would make it possible for a very small team at the Singularity Institute to create AGI:

There's a tradeoff between the depth of AI theory, the amount of time it takes to implement the project, the number of people required, and how smart those people need to be. The AI theory we're planning to use - not LOGI, LOGI's successor - will save time and it means that the project may be able to get by with fewer people. But those few people will have to be brilliant…. The theory of AI is a lot easier than the practice, so if you can learn the practice at all, you should be able to pick up the theory on pretty much the first try. The current theory of AI I'm using is considerably deeper than what's currently online in Levels of Organization in General Intelligence - so if you'll be able to master the new theory at all, you shouldn't have had trouble with LOGI. I know people who did comprehend LOGI on the first try; who can complete patterns and jump ahead in explanations and get everything right, who can rapidly fill in gaps from just a few hints, who still don't have the level of ability needed to work on an AI project.

Somewhat disputable examples

I think of the previous two examples as predictions that resolved negatively. I'll now cover a few predictions that we don't yet know are wrong (e.g. predictions about the role of compute in developing AGI), but I think now have reason to regard as significantly overconfident.

3. Having high confidence that AI progress would be extremely discontinuous and localized and not require much compute

In his 2008 "FOOM debate" with Robin Hanson, Yudkowsky confidentally staked out very extreme positions about what future AI progress would look like - without (in my view) offering strong justifications. The past decade of AI progress has also provided further evidence against the correctness of his core predictions.

A quote from the debate, describing the median development scenario he was imaging at the time:

When we try to visualize how all this is likely to go down, we tend to visualize a scenario that someone else once termed “a brain in a box in a basement.” I love that phrase, so I stole it. In other words, we tend to visualize that there’s this AI programming team, a lot like the sort of wannabe AI programming teams you see nowadays, trying to create artificial general intelligence, like the artificial general intelligence projects you see nowadays. They manage to acquire some new deep insights which, combined with published insights in the general scientific community, let them go down into their basement and work on it for a while and create an AI which is smart enough to reprogram itself, and then you get an intelligence explosion…. (p. 436)

The idea (as I understand it) was that AI progress would have very little impact on the world, then a small team of people with a very small amount of computing power would have some key insight, then they’d write some code for an AI system, then that system would rewrite its own code, and then it would shortly after take over the world.

When pressed by his debate partner, regarding the magnitude of the technological jump he was forecasting, Yudkowsky suggested that economic output could at least plausibly rise by twenty orders-of-magnitude within not much more than a week - once the AI system has developed relevant nanotechnologies (pg. 400).^[8] To give a sense of how extreme that is: If you extrapolate twenty-orders-of-magnitude-per-week over the course of a year - although, of course, no one expected this rate to be maintained for anywhere close to a year - it is equivalent to an annual economic growth rate of (10^1000)%.

I think it’s pretty clear that this viewpoint was heavily influenced by the reigning AI paradigm at the time, which was closer to traditional programming than machine learning. The emphasis on “coding” (as opposed to training) as the means of improvement, the assumption that large amounts of compute are unnecessary, etc. seem to follow from this. A large part of the debate was Yudkowsky arguing against Hanson, who thought that Yudkowsky was underrating the importance of compute and “content” (i.e. data) as drivers of AI progress. Although Hanson very clearly wasn’t envisioning something like deep learning either^[9], his side of the argument seems to fit better with what AI progress has looked like over the past decade. In particular, huge amounts of compute and data have clearly been central to recent AI progress and are currently commonly thought to be central - or, at least, necessary - for future progress.

In my view, the pro-FOOM essays in the debate also just offered very weak justifications for thinking that a small number of insights could allow a small programming team, with a small amount of computing power, to abruptly jump the economic growth rate up by several orders of magnitude. The main reasons that stood out to me, from the debate, are these:^[10]

It requires less than a gigabyte to store someone’s genetic information on a computer (p. 444).^[11]
The brain “just doesn’t look all that complicated” in comparison to human-made pieces of technology such as computer operating systems (p.444), on the basis of the principles that have been worked out by neuroscientists and cognitive scientists.
There is a large gap between the accomplishments of humans and chimpanzees, which Yudkowsky attributes this to a small architectural improvement: “If we look at the world today, we find that taking a little bit out of the architecture produces something that is just not in the running as an ally or a competitor when it comes to doing cognitive labor….[T]here are no branches of science where chimpanzees do better because they have mostly the same architecture and more relevant content” (p. 448).
Although natural selection can be conceptualized as implementing a simple algorithm, it was nonetheless capable of creating the human mind.

I think that Yudkowsky's prediction - that a small amount of code, run using only a small amount of computing power, was likely to abruptly jump economic output upward by more than a dozen orders-of-magnitude - was extreme enough to require very strong justifications. My view is that his justifications simply weren't that strong. Given the way AI progress has looked over the past decade, his prediction also seems very likely to resolve negatively.^[12]

4. Treating early AI risk arguments as close to decisive

In my view, the arguments for AI risk that Yudkowsky had developed by the early 2010s had a lot of very important gaps. They were suggestive of a real risk, but were still far from worked out enough to justify very high credences in extinction from misaligned AI. Nonetheless, Yudkowsky recalls his credence in doom was "around the 50% range" at the time, and his public writing tended to suggest that he saw the arguments as very tight and decisive.

These slides summarize what I see as gaps in the AI risk argument that appear in Yudkowsky’s essays/papers and in Superintelligence, which presents somewhat fleshed out and tweaked versions of Yudkowsky’s arguments. This podcast episode covers most of the same points. (Note that almost none of these objections I walk through are entirely original to me.)

You can judge for yourself whether these criticisms of his arguments fair. If they seem unfair to you, then, of course, you should disregard this as an illustration of an overconfident prediction. One additional piece of evidence, though, is that his arguments focused on a fairly specific catastrophe scenario that most researchers now assign less weight to than they did when they first entered the field.

For instance, the classic arguments treated used an extremely sudden "AI takeoff" as a central premise. Arguably, fast takeoff was the central premise, since presentations of the risk often began by establishing that there is likely to be a fast take-off (and thus an opportunity for a decisive strategic advantage) and then built the remainder of the argument on top of this foundation. However, many people in the field have now moved away from finding sudden take-off arguments compelling (e.g. for the kinds of reasons discussed here and here).

My point, here, is not necessarily that Yudkowsky was wrong, but rather that he held a much higher credence in existential risk from AI than his arguments justified at the time. The arguments had pretty crucial gaps that still needed to be resolved^[13], but, I believe, his public writing tended to suggest that these arguments were tight and sufficient to justify very high credences in doom.

5. Treating "coherence arguments" as forceful

In the mid-2010s, some arguments for AI risk began to lean heavily on “coherence arguments” (i.e. arguments that draw implications from the von Neumann-Morgenstern utility theorem) to support the case for AI risk. See, for instance, this introduction to AI risk from 2016, by Yudkowsky, which places a coherence argument front and center as a foundation for the rest of the presentation. I think it's probably fair to guess that the introduction-to-AI-risk talk that Yudkowsky was giving in 2016 contained what he regarded as the strongest concise arguments available.

However, later analysis has suggested that coherence arguments have either no or very limited implications for how we should expect future AI systems to behave. See Rohin Shah’s (I think correct) objection to the use of “coherence arguments” to support AI risk concerns. See also similar objections by Richard Ngo and Eric Drexler (Section 6.4).

Unfortunately, this is another case where the significance of this example depends on how much validity you assign to a given critique. In my view, the critique is strong. However, I'm unsure what portion of alignment researchers currently agree with me. I do know of at least one prominent researcher who was convinced by it; people also don't seem to make coherence arguments very often anymore, which perhaps suggests that the critiques have gotten traction. However, if you have the time and energy, you should reflect on the critiques for yourself.^[14]

If the critique is valid, then this would be another example of Yudkowsky significantly overestimating the strength of an argument for AI risk.

[[EDIT: See here for a useful clarification by Rohin.]]

A somewhat meta example

6. Not acknowledging his mixed track record

So far as I know, although I certainly haven't read all of his writing, Yudkowsky has never (at least publicly) seemed to take into account the mixed track record outlined above - including the relatively unambiguous misses.

He has written about mistakes from early on in his intellectual life (particularly pre-2003) and has, on this basis, even made a blanket-statement disavowing his pre-2003 work. However, based on my memory and a quick re-read/re-skim, this writing is an exploration of why it took him a long time to become extremely concerned about existential risks from misaligned AI. For instance, the main issue it discusses with his plans to build AGI are that these plans didn't take into account the difficulty and importance of ensuring alignment. This writing isn't, I think, an exploration or acknowledgement of the kinds of mistakes I've listed in this post.

The fact he seemingly hasn’t taken these mistakes into account - and, if anything, tends to write in a way that suggests he holds a very high opinion of his technological forecasting track record - leads me to trust his current judgments less than I otherwise would.

To be clear, Yudkowsky isn’t asking other people to defer to him. He’s spent a huge amount of time outlining his views (allowing people to evaluate them on their merits) and has often expressed concerns about excessive epistemic deference. ↩︎
A better, but still far-from-optimal approach to deference might be to give a lot of weight to the "average" view within the pool of smart people who have spent a reasonable amount of time thinking about AI risk. This still isn't great, though, since different people do deserve different amounts of weight, and since there's at least some reason to think that selection effects might bias this pool toward overestimating the level of risk. ↩︎
It might be worth emphasizing that I’m not making any claim about the relative quality of my own track record. ↩︎
To say something concrete about my current views on misalignment risk: I'm currently inclined to assign a low-to-mid-single-digits probability to existential risk from misaligned AI this century, with a lot of volatility in my views. This is of course, in some sense, still extremely high! ↩︎
I think that expressing extremely high credences in existential risk (without sufficiently strong and clear justification) can also lead some people to simply dismiss the concerns. It is often easier to be taken seriously, when talking about strange and extreme things, if you express significant uncertainty. Importantly, I don't think this means that people should ever misrepresent their levels of concern about existential risks; dishonesty seems like a really bad and corrosive policy. Still, this is one extra reason to think that it can be important to avoid overestimating risks. ↩︎
Yudkowsky is obviously a pretty polarizing figure. I'd also say that some people are probably too dismissive of him, for example because they assign too much significance to his lack of traditional credentials. But it also seems clear that many people are inclined to give Yudkowsky's views a great deal of weight. I've even encountered the idea that Yudkowsky is virtually the only person capable of thinking about alignment risk clearly. ↩︎
I think that cherry-picking examples from someone's forecasting track record is normally bad to do, even if you flag that you're engaged in cherry-picking. However, I do think (or at least hope) that it's fair in cases where someone already has a very high level of respect and frequently draws attention to their own successful predictions. ↩︎
I don't mean to suggest that the specific twenty orders-of-magnitude of growth figure was the result of deep reflection or was Yudkowsky's median estimate. Here is the specific quote, in response to Hanson raising the twenty orders-of-magnitude-in-a-week number: "Twenty orders of magnitude in a week doesn’t sound right, unless you’re talking about the tail end after the AI gets nanotechnology. Figure more like some number of years to push the AI up to a critical point, two to six orders of magnitude improvement from there to nanotech, then some more orders of magnitude after that." I think that my general point, that this is a very extreme prediction, stays the same even if we lower the number to ten orders-of-magnitude and assume that there will be a bit of a lag between the 'critical point' and the development of the relevant nanotechnology. ↩︎
As an example of a failed prediction or piece of analysis on the other side of the FOOM debate, Hanson praised the CYC project - which lies far afield of the current deep learning paradigm and now looks like a clear dead end. ↩︎
Yudkowsky also provides a number of arguments in favor of the view that the human mind can be massively improved upon. I think these arguments are mostly right. However, I think, they don't have any very strong implications for the question of whether AI progress will be compute-intensive, sudden, or localized. ↩︎
To probe just the relevance of this one piece of evidence, specifically, let’s suppose that it’s appropriate to use the length of a person’s genome in bits of information as an upper bound on the minimum amount of code required to produce a system that shares their cognitive abilities (excluding code associated with digital environments). This would imply that it is in principle possible to train an ML model that can do anything a given person can do, using something on the order of 10 million lines of code. But even if we accept this hypothesis - which seems quite plausible to me - it doesn’t seem to me like this implies much about the relative contributions of architecture and compute to AI progress or the extent to which progress in architecture design is driven by “deep insights.” For example, why couldn’t it be true that it is possible to develop a human-equivalent system using fewer than 10 million lines of code and also true that computing power (rather than insight) is the main bottleneck to developing such a system? ↩︎
Two caveats regarding my discussion of the FOOM debate:

First, I should emphasize that, although I think Yudkowsky’s arguments were weak when it came to the central hypothesis being debated, his views were in some other regards more reasonable than his debate partner’s. See here for comments by Paul Christiano on how well various views Yudkowsky expressed in the FOOM debate have held up.

Second, it's been a few years since I've read the FOOM debate - and there's a lot in there (the book version of it is 741 pages long) - so I wouldn't be surprised if my high-level characterization of Yudkowsky's arguments is importantly misleading. My characterization here is based on some rough notes I took the last time I read it. ↩︎
For example, it may be possible to construct very strong arguments for AI risk that don't rely on the fast take-off assumption. However, in practice, I think it's fair to say that the classic arguments did rely on this assumption. If the assumption wasn't actually very justified, then, I think, it seems to follow that having a very high credence in AI risk also wasn't justified at the time ↩︎
Here’s another example of an argument that’s risen to prominence in the past few years, and plays an important role in some presentations of AI risk, that I now suspect simply might not work. This argument shows up, for example, in Yudkowsky’s recent post “AGI Ruin: A List of Lethalities,” at the top of the section outlining “central difficulties.” ↩︎

264 Reactions

Mentioned in

203Reasons I’ve been hesitant about high levels of near-ish AI risk

130[linkpost] Christiano on agreement/disagreement with Yudkowsky's "List of Lethalities"

93AI timelines by bio anchors: the debate in one place

70Future Matters #3: digital sentience, AGI ruin, and forecasting track records

55Monthly Overload of EA - July 2022

Load more (5/7)

More posts like this

Comments187

Sorted by

New & upvoted

Click to highlight new comments since: Today at 4:32 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

richard_ngoJun 20 2022145

EDIT: I've now written up my own account of how we should do epistemic deference in general, which fleshes out more clearly a bunch of the intuitions I outline in this comment thread.

I think that a bunch of people are overindexing on Yudkowsky's views; I've nevertheless downvoted this post because it seems like it's making claims that are significantly too strong, based on a methodology that I strongly disendorse. I'd much prefer a version of this post which, rather than essentially saying "pay less attention to Yudkowsky", is more nuanced about how to update based on his previous contributions; I've tried to do that in this comment, for example. (More generally, rather than reading this post, I recommend people read this one by Paul Christiano, which outlines specific agreements and disagreements. Note that the list of agreements there, which I expect that many other alignment researchers also buy into, serves as a significant testament to Yudkowsky's track record.)

The part of this post which seems most wild to me is the leap from "mixed track record" to

In particular, I think, they shouldn’t defer to him more than they would defer to anyone else who seems smart and has spent a rea

... (read more)

bgarfinkelJun 20 202283

The part of this post which seems most wild to me is the leap from "mixed track record" to

In particular, I think, they shouldn’t defer to him more than they would defer to anyone else who seems smart and has spent a reasonable amount of time thinking about AI risk.

For any reasonable interpretation of this sentence, it's transparently false. Yudkowsky has proven to be one of the best few thinkers in the world on a very difficult topic. Insofar as there are others who you couldn't write a similar "mixed track record" post about, it's almost entirely because they don't have a track record of making any big claims, in large part because they weren't able to generate the relevant early insights themselves. Breaking ground in novel domains is very, very different from forecasting the weather or events next year; a mixed track record is the price of entry.

I disagree that the sentence is false for the interpretation I have in mind.

I think it's really important to seperate out the question "Is Yudkowsky an unusually innovative thinker?" and the question "Is Yudkowsky someone whose credences you should give an unusual amount of weight to?"

I read your comment as arguing for the former,... (read more)

richard_ngoJun 20 202253

I phrased my reply strongly (e.g. telling people to read the other post instead of this one) because deference epistemology is intrinsically closely linked to status interactions, and you need to be pretty careful in order to make this kind of post not end up being, in effect, a one-dimensional "downweight this person". I don't think this post was anywhere near careful enough to avoid that effect. That seems particularly bad because I think most EAs should significantly upweight Yudkowsky's views if they're doing any kind of reasonable, careful deference, because most EAs significantly underweight how heavy-tailed the production of innovative ideas actually is (e.g. because of hindsight bias, it's hard to realise how much worse than Eliezer we would have been at inventing the arguments for AI risk, and how many dumb things we would have said in his position).

By contrast, I think your post is implicitly using a model where we have a few existing, well-identified questions, and the most important thing is to just get to the best credences on those questions, and we should do so partly by just updating in the direction of experts. But I think this model of deference is rarely relevant... (read more)

Rohin ShahJun 20 202260

when it comes to making big-picture forecasts, the main value of deference is in helping us decide which ideas and arguments to take seriously, rather than the specific credences we should place on them, since the space of ideas is so large.

This seems like an overly research-centric position.

When your job is to come up with novel relevant stuff in a domain, then I agree that it's mostly about "which ideas and arguments to take seriously" rather than specific credences.

When your job is to make decisions right now, the specific credences matter. Some examples:

Any cause prioritization decision, e.g. should funders reallocate nearly all biosecurity money to AI?
What should AI-focused community builders provide as starting resources?
Should there be an organization dedicated to solving Eliezer's health problems? What should its budget be?
Should people try to solve technical AI alignment or try to, idk, create a culture of secrecy within AGI labs?

richard_ngoJun 20 202213

I think that there are very few decisions which are both a) that low-dimensional and b) actually sensitive to the relevant range of credences that we're talking about.

Like, suppose you think that Eliezer's credences on his biggest claims are literally 2x higher than they should be, even for claims where he's 90% confident. This is a huge hit in terms of Bayes points; if that's how you determine deference, and you believe he's 2x off, then plausibly that implies you should defer to him less than you do to the median EA. But when it comes to grantmaking, for example, a cost-effectiveness factor of 2x is negligible given the other uncertainties involved - this should very rarely move you from a yes to no, or vice versa. (edit: I should restrict the scope here to grantmaking in complex, high-uncertainty domains like AI alignment).

Then you might say: well, okay, we're not just making binary decisions, we're making complex decisions where we're choosing between lots of different options. But the more complex the decisions you're making, the less you should care about whether somebody's credences on a few key claims are accurate, and the more you should care about whether they're identify... (read more)

CarlShulmanJun 20 202228

Like, suppose you think that Eliezer's credences on his biggest claims are literally 2x higher than they should be, even for claims where he's 90% confident. This is a huge hit in terms of Bayes points; if that's how you determine deference, and you believe he's 2x off, then plausibly you should defer to him less than you do to the median EA. But when it comes to grantmaking, for example, a cost-effectiveness factor of 2x is negligible given the other uncertainties involved - this should very rarely move you from a yes to no, or vice versa.

Such differences are crucial for many of the most important grant areas IME, because they are areas where you are trading off multiple high-stakes concerns. E.g. in nuclear policy all the strategies on offer have arguments that they might lead to nuclear war or worse war. On AI alignment there are multiple such tradeoffs and people embracing strategies to push the same variable in opposite directions with high stakes on both sides.

richard_ngo

Jun 20 2022

I haven't thought much about nuclear policy, so I can't respond there. But at least in alignment, I expect that pushing on variables where there's less than a 2x difference between the expected positive and negative effects of changing that variable is not a good use of time for altruistically-motivated people. (By contrast, upweighting or downweighting Eliezer's opinions by a factor of 2 could lead to significant shifts in expected value, especially for people who are highly deferential. The specific thing I think doesn't make much difference is deferring to a version of Eliezer who's 90% confident about something, versus deferring to the same extent to a version of Eliezer who's 45% confident in the same thing.) My more general point, which doesn't hinge on the specific 2x claim, is that naive conversions between metrics of calibration and deferential weightings are a bad idea, and that a good way to avoid naive conversions is to care a lot more about innovative thinking than calibration when deferring.

Rohin ShahJun 22 202226

Like, suppose you think that Eliezer's credences on his biggest claims are literally 2x higher than they should be, even for claims where he's 90% confident.

I think differences between Eliezer + my views often make way more than a 2x difference to the bottom line. I'm not sure why you're only considering probabilities on specific claims; when I think of "deferring" I also imagine deferring on estimates of usefulness of various actions, which can much more easily have OOMs of difference.

(Fwiw I also think Eliezer is way more than 2x too high for probabilities on many claims, though I don't think that matters much for my point.)

Taking my examples:

should funders reallocate nearly all biosecurity money to AI?

Since Eliezer thinks something like 99.99% chance of doom from AI, that reduces cost effectiveness of all x-risk-targeted biosecurity work by a factor of 10,000x (since only in 1 in 10,000 worlds does the reduced bio x-risk matter at all), whereas if you have < 50% of doom from AI (as I do) then that's a discount factor of < 2x on x-risk-targeted biosecurity work. So that's almost 4 OOMs of difference.

What should AI-focused community builders provide as starting resources?

Eli... (read more)

richard_ngoJun 23 202212

We both agree that you shouldn't defer to Eliezer's literal credences, because we both think he's systematically overconfident. The debate is between two responses to that:

a) Give him less deference weight than the cautious, sober, AI safety people who make few novel claims but are better-calibrated (which is what Ben advocates).

b) Try to adjust for his overconfidence and then give significant deference weight to a version of his worldview that isn't overconfident.

I say you should do the latter, because you should be deferring to coherent worldviews (which are rare) rather than deferring on a question-by-question basis. This becomes more and more true the more complex the decisions you have to make. Even for your (pretty simple) examples, the type of deference you seem to be advocating doesn't make much sense.

For instance:

should funders reallocate nearly all biosecurity money to AI?

It doesn't make sense to defer to Eliezer's estimate of the relative importance of AI without also accounting for his estimate of the relative tractability of funding AI, which I infer he thinks is very low.

Should there be an organization dedicated to solving Eliezer's health problems? What should

... (read more)

richard_ngoJun 23 202211

Musing out loud: I don't know of any complete model of deference which doesn't run into weird issues, like the conclusion that you should never trust yourself. But suppose you have some kind of epistemic parliament where you give your own views some number of votes, and assign the rest of the votes to other people in proportion to how defer-worthy they seem. Then you need to make a bunch of decisions, and your epistemic parliament keeps voting on what will best achieve your (fixed) goals.

If you do naive question-by-question majority voting on each question simultaneously then you can end up with an arbitrarily incoherent policy - i.e. a set of decisions that's inconsistent with each other. And if you make the decisions in some order, with the constraint that they each have to be consistent with all prior decisions, then the ordering of the decisions can become arbitrarily important.

Instead, you want your parliament to negotiate some more coherent joint policy to follow. And I expect that in this joint policy, each worldview gets its way on the questions that are most important to it, and cedes responsibility on the questions that are least important. So Eliezer's worldview doesn't end up reallocating all the biosecurity money, but it does get a share of curriculum time (at least for the most promising potential researchers). But in general how to conduct those negotiations is an unsolved problem (and pretty plausibly unsolveable).

Rohin Shah

Jun 23 2022

There's lots of things you can do under Eliezer's worldview that add dignity points, like paying relevant people millions of dollars to spend a week really engaging with the arguments, or trying to get whole-brain emulation before AGI. My understanding is that he doesn't expect those sorts of things to happen. This seems like a crazy way to do cost-effectiveness analyses. Like, if I were comparing deworming to GiveDirectly, would I be saying "well, the value of deworming is mainly determined by the likelihood that the pro-deworming people are right, which I estimate is 70% but you estimate is 50%, so there's only a 1.4x difference"? Something has clearly gone wrong here. It also feels like this reasoning implies that no EA action can be > 10x more valuable than any other action that an EA critic thinks is good? Since you assign a 90% chance that the EA is right, and the critic thinks there's a 10% chance of that, so there's only a 9x gap? And then once you do all of your adjustments it's only 2x? Why do we even bother with cause prioritization under this worldview? I don't have a fleshed out theory of how and when to defer, but I feel pretty confident that even our intuitive pretheoretic deference should not be this sort of thing, and should be the sort of thing that can have orders of magnitude of difference between actions. (One major thing is that I think you should be comparing between two actions, rather than evaluating an action by itself, which is why I compared to "all other alignment work".) I don't see why you are not including "c) give significant deference weight to his actual worldview", which is what I'd be inclined to do if I didn't have significant AI expertise myself and so was trying to defer. (Aside: note that Ben said "they shouldn’t defer to him more than they would defer to anyone else who seems smart and has spent a reasonable amount of time thinking about AI risk", which is slightly different from your rephrasing, but that's a nitpick)

richard_ngoJun 24 202210

(One major thing is that I think you should be comparing between two actions, rather than evaluating an action by itself, which is why I compared to "all other alignment work".)

IMO the crux is that I disagree with both of these. Instead I think you should use each worldview to calculate a policy, and then generate some kind of compromise between those policies. My arguments above were aiming to establish that this strategy is not very sensitive to exactly how much you defer to Eliezer, because there just aren't very many good worldviews going around - hence why I assign maybe 15 or 20% (inside view) credence to his worldview (updated from 10% above after reflection). (I think my all-things-considered view is similar, actually, because deference to him cancels out against deference to all the people who think he's totally wrong.)

Again, the difference is in large part determined by whether you think you're in a low-dimensional space (here are our two actions, which one should we take?) versus a high-dimensional space (millions of actions available to us, how do we narrow it down?) In a high-dimensional space the tradeoffs between the best ways to generate utility according to Eliezer... (read more)

Rohin Shah

Jun 24 2022

Okay, my new understanding of your view is that you're suggesting that (if one is going to defer) one should: 1. Identify a panel of people to defer to 2. Assign them weights based on how good they seem (e.g. track record, quality and novelty of ideas, etc) 3. Allocate resources to [policies advocated by person X] in proportion to [weight assigned to person X]. I agree that (a) this is a reasonable deference model and (b) under this deference model most of my calculations and questions in this thread don't particularly make sense to think about. However, I still disagree with the original claim I was disagreeing with: Even in this new deference model, it seems like the specific weights chosen in step 2 are a pretty big deal (which seem like the obvious analogues of "credences", and the sort of thing that Ben's post would influence). If you switch from a weight of 0.5 to a weight of 0.3, that's a reallocation of 20% of your resources, which is pretty large!

richard_ngo

Jun 24 2022

Yepp, thanks for the clear rephrasing. My original arguments for this view were pretty messy because I didn't have it fully fleshed out in my mind before writing this comment thread, I just had a few underlying intuitions about ways I thought Ben was going wrong. Upon further reflection I think I'd make two changes to your rephrasing. First change: in your rephrasing, we assign people weights based on the quality of their beliefs, but then follow their recommended policies. But any given way of measuring the quality of beliefs (in terms of novelty, track record, etc) is only an imperfect proxy for quality of policies. For example, Kurzweil might very presciently predict that compute is the key driver of AI progress, but suppose (for the sake of argument) that the way he does so is by having a worldview in which everything is deterministic, individuals are powerless to affect the future, etc. Then you actually don't want to give many resources to Kurzweil's policies, because Kurzweil might have no idea which policies make any difference. So I think I want to adjust the rephrasing to say: in principle we should assign people weights based on how well their past recommended policies for someone like you would have worked out, which you can estimate using things like their track record of predictions, novelty of ideas, etc. But notably, the quality of past recommended policies is often not very sensitive to credences! For example, if you think that there's a 50% chance of solving nanotech in a decade, or a 90% chance of solving nanotech in a decade, then you'll probably still recommend working on nanotech (or nanotech safety) either way. Having said all that, since we only get one rollout, evaluating policies is very high variance. And so looking at other information like reasoning, predictions, credences, etc, helps you distinguish between "good" and "lucky". But fundamentally we should think of these as approximations to policy evaluation, at least if you're assum

Rohin Shah

Jun 25 2022

In your Kurzweil example I think the issue is not that you assigned weights based on hypothetical-Kurzweil's beliefs, but that hypothetical-Kurzweil is completely indifferent over policies. I think the natural fix is "moral parliament" style decision making where the weights can still come from beliefs but they now apply more to preferences-over-policies. In your example hypothetical-Kurzweil has a lot of weight but never has any preferences-over-policies so doesn't end up influencing your decisions at all. That being said, I agree that if you can evaluate quality of past recommended policies well, without a ton of noise, that would be a better signal than accuracy of beliefs. This just seems extremely hard to do, especially given the selection bias in who comes to your attention in the first place, and idk how I'd do it for Eliezer in any sane way. (Whereas you get to see people state many more beliefs and so there are a lot more data points that you can evaluate if you look at beliefs.) I think you're thinking way too much about credences-in-particular. The relevant notion is not "credences", it's that-which-determines-how-much-influence-the-person-has-over-your-actions. In this model of deference the relevant notion is the weights assigned in step 2 (however you calculate them), and the message of Ben's post would be "I think people assign too high a weight to Eliezer", rather than anything about credences. I don't think either Ben or I care particularly much about credences-based-on-deference except inasmuch as they affect your actions. I do agree that Ben's post looks at credences that Eliezer has given and considers those to be relevant evidence for computing what weight to assign Eliezer. You could take a strong stand against using people's credences or beliefs to compute weights, but that is at least a pretty controversial take (that I personally don't agree with), and it seems different from what you've been arguing so far (except possibly in the parent

richard_ngo

Jun 28 2022

Your procedure is non-robust in the sense that, if Kurzweil transitions from total indifference to thinking that one policy is better by epsilon, he'll throw his full weight behind that policy. Hmm, but then in a parliamentary approach I guess that if there are a few different things he cares epsilon about, then other policies could negotiate to give him influence only over the things they don't care about themselves. Weighting by hypothetical-past-impact still seems a bit more elegant, but maybe it washes out. (If we want to be really on-policy then I guess the thing which we should be evaluating is whether the person's worldview would have had good consequences when added to our previous mix of worldviews. And one algorithm for this is assigning policies weights by starting off from a state where they don't know anything about the world, then letting them bet on all your knowledge about the past (where the amount they win on bets is determined not just by how correct they are, but also how much they disagree with other policies). But this seems way too complicated to be helpful in practice.) I think I'm happy with people spending a bunch of time evaluating accuracy of beliefs, as long as they keep in mind that this is a proxy for quality of recommended policies. Which I claim is an accurate description of what I was doing, and what Ben wasn't: e.g. when I say that credences matter less than coherence of worldviews, that's because the latter is crucial for designing good policies, whereas the former might not be; and when I say that all-things-considered estimates of things like "total risk level" aren't very important, that's because in principle we should be aggregating policies not risk estimates between worldviews. I also agree that selection bias could be a big problem; again, I think that the best strategy here is something like "do the standard things while remembering what's a proxy for what".

Rohin Shah

Jun 28 2022

Meta: This comment (and some previous ones) get a bunch into "what should deference look like", which is interesting, but I'll note that most of this seems unrelated to my original claim, which was just "deference* seems important for people making decisions now, even if it isn't very important in practice for researchers", in contradiction to a sentence on your top-level comment. Do you now agree with that claim? *Here I mean deference in the sense of how-much-influence-various-experts-have-over-your-actions. I initially called this "credences" because I thought you were imagining a model of deference in which literal credences determined how much influence experts had over your actions. Agreed, but I'm not too worried about that. It seems like you'll necessarily have some edge cases like this; I'd want to see an argument that the edge cases would be common before I switch to something else. The chain of approximations could look something like: 1. The correct thing to do is to consider all actions / policies and execute the one with the highest expected impact. 2. First approximation: Since there are so many actions / policies, it would take too long to do this well, and so we instead take a shortcut and consider only those actions / policies that more experienced people have thought of, and execute the ones with the highest expected impact. (I'm assuming for now that you're not in the business of coming up with new ideas of things to do.) 3. Second approximation: Actually it's still pretty hard to evaluate the expected impact of the restricted set of actions / policies, so we'll instead do the ones that the experts say is highest impact. Since the experts disagree, we'll divide our resources amongst them, in accordance with our predictions of which experts have highest expected impact across their portfolios of actions. (This is assuming a large enough pile of resources that it makes sense to diversify due to diminishing marginal returns for any one expert

richard_ngoJun 28 202211

Meta: I'm currently writing up a post with a fully-fleshed-out account of deference. If you'd like to drop this thread and engage with that when it comes out (or drop this thread without engaging with that), feel free; I expect it to be easier to debate when I've described the position I'm defending in more detail.

I'll note that most of this seems unrelated to my original claim, which was just "deference* seems important for people making decisions now, even if it isn't very important in practice for researchers", in contradiction to a sentence on your top-level comment. Do you now agree with that claim?

I always agreed with this claim; my point was that the type of deference which is important for people making decisions now should not be very sensitive to the "specific credences" of the people you're deferring to. You were arguing above that the difference between your and Eliezer's views makes much more than a 2x difference; do you now agree that, on my account of deference, a big change in the deference-weight you assign to Eliezer plausibly leads to a much smaller change in your policy from the perspective of other worldviews, because the Eliezer-worldview trades off influence ... (read more)

Rohin Shah

Jun 29 2022

I was arguing that EV estimates have more than a 2x difference; I think this is pretty irrelevant to the deference model you're suggesting (which I didn't know you were suggesting at the time). No, I don't agree with that. It seems like all the worldviews are going to want resources (money / time) and access to that is ~zero-sum. (All the worldviews want "get more resources" so I'm assuming you're already doing that as much as possible.) The bargaining helps you avoid wasting resources on counterproductive fighting between worldviews, it doesn't change the amount of resources each worldview gets to spend. Going from allocating 10% of your resources to 20% of your resources to a worldview seems like a big change. It's a big difference if you start with twice as much money / time as you otherwise would have, unless there just happens to be a sharp drop in marginal utility of resources between those two points for some reason. Maybe you think that there are lots of things one could do that have way more effect than "redirecting 10% of one's resources" and so it's not a big deal? If so can you give examples? I agree overconfidence is common and you shouldn't literally calculate a Brier score to figure out who to defer to. I agree that directionally-correct beliefs are better correlated than calibrated credences. When I say "evaluate beliefs" I mean "look at stated beliefs and see how reasonable they look overall, taking into account what other people thought when the beliefs were stated" and not "calculate a Brier score"; I think this post is obviously closer to the former than the latter. I agree that people's other goals make it harder to evaluate what their "true beliefs" are, and that's one of the reasons I say it's only 3/10 correlation. Re: correlation, I was implicitly also asking the question "how much does this vary across experts". Across the general population, maybe coherence is 7/10 correlated with expected future impact; across the experts that one

richard_ngo

Jul 13 2022

I've now written up a more complete theory of deference here. I don't expect that it directly resolves these disagreements, but hopefully it's clearer than this thread. Note that this wouldn't actually make a big change for AI alignment, since we don't know how to use more funding. It'd make a big change if we were talking about allocating people, but my general heuristic is that I'm most excited about people acting on strong worldviews of their own, and so I think the role of deference there should be much more limited than when it comes to money. (This all falls out of the theory I linked above.) Experts are coherent within the bounds of conventional study. When we try to apply that expertise to related topics that are less conventional (e.g. ML researchers on AGI; or even economists on what the most valuable interventions are) coherence drops very sharply. (I'm reminded of an interview where Tyler Cowen says that the most valuable cause area is banning alcohol, based on some personal intuitions.) The question is how it compares to estimating past correctness, where we face pretty similar problems. But mostly I think we don't disagree too much on this question - I think epistemic evaluations are gonna be bigger either way, and I'm mostly just advocating for the "think-of-them-as-a-proxy" thing, which you might be doing but very few others are.

Rohin Shah

Jul 14 2022

Funding isn't the only resource: * You'd change how you introduce people to alignment (since I'd guess that has a pretty strong causal impact on what worldviews they end up acting on). E.g. if you previously flipped a 10%-weighted coin to decide whether to send them down the Eliezer track or the other track, now you'd flip a 20%-weighted coin, and this straightforwardly leads to different numbers of people working on particular research agendas that the worldviews disagree about. Or if you imagine the community as a whole acting as an agent, you send 20% of the people to MIRI fellowships and the remainder to other fellowships (whereas previously it would be 10%). * (More broadly I think there's a ton of stuff you do differently in community building, e.g. do you target people who know ML or people who are good at math?) * You'd change what you used political power for. I don't particularly understand what policies Eliezer would advocate for but they seem different, e.g. I think I'm more keen on making sure particular alignment schemes for building AI systems get used and less keen on stopping everyone from doing stuff besides one secrecy-oriented lab that can become a leader. Yeah, that's what I mean.

Rohin Shah

Jun 24 2022

Responding to other more minor points: I mean that he predicts that these costly actions will not be taken despite seeming good to him. I think it's also important to consider Ben's audience. If I were Ben I'd be imagining my main audience to be people who give significant deference weight to Eliezer's actual worldview. If you're going to write a top-level comment arguing against Ben's post it seems pretty important to engage with the kind of deference he's imagining (or argue that no one actually does that kind of deference, or that it's not worth writing to that audience, etc). (Of course, I could be wrong about who Ben imagines his audience to be.)

-1

Verden

Jun 23 2022

This survey suggests that he was at 96-98% a year ago.

RobBensinger

Jun 23 2022

Why do you think it suggests that? There are two MIRI responses in that range, but responses are anonymous, and most MIRI staff didn't answer the survey.

Verden

Jun 23 2022

I should have clarified that I think (or at least I thought so, prior to your question; kind of confused now) Yudkowsky's answer is probably one of those two MIRI responses. Sorry about that. I recall you or somebody else at MIRI once wrote something along the lines that most of MIRI researchers don't actually believe that p(doom) is extremely high, like >90% doom. Then, in the linked post, there is a comment from someone who marked themselves both as a technical safety and strategy researcher and who gave 0.98, 0.96 on your questions. The style/content of the comment struck me as something Yudkowsky would have written.

RobBensinger

Jun 24 2022

Cool! I figured your reasoning was probably something along those lines, but I wanted to clarify that the survey is anonymous and hear your reasoning. I personally don't know who wrote the response you're talking about, and I'm very uncertain how many researchers at MIRI have 90+% p(doom), since only five MIRI researchers answered the survey (and marked that they're from MIRI).

RobBensinger

Jun 23 2022

I could be wrong, but I'd guess Eliezer's all-things-considered p(doom) is less extreme than that.

richard_ngo

Jun 23 2022

Yeah, I'm gonna ballpark guess he's around 95%? I think the problem is that he cites numbers like 99.99% when talking about the chance of doom "without miracles", which in his parlance means assuming that his claims are never overly pessimistic. Which seems like wildly bad epistemic practice. So then it goes down if you account for that, and then maybe it goes down even further if he adjusts for the possibility that other people are more correct than him overall (although I'm not sure that's a mental move he does at all, or would ever report on if he did).

Rohin Shah

Jun 23 2022

Even at 95% you get OOMs of difference by my calculations, though significantly fewer OOMs, so this doesn't seem like the main crux.

kokotajlodJun 20 202233

Beat me to it & said it better than I could.

My now-obsolete draft comment was going to say:

It seems to me that between about 2004 and 2014, Yudkowsky was the best person in the world to listen to on the subject of AGI and AI risks. That is, deferring to Yudkowsky would have been a better choice than deferring to literally anyone else in the world. Moreover, after about 2014 Yudkowsky would probably have been in the top 10; if you are going to choose 10 people to split your deference between (which I do not recommend, I recommend thinking for oneself), Yudkowsky should be one of those people and had you dropped Yudkowsky from the list in 2014 you would have missed out on some important stuff. Would you agree with this?

On the positive side, I'd be interested to see a top ten list from you of people you think should be deferred to as much or more than Yudkowsky on matters of AGI and AI risks.*

*What do I mean by this? Idk, here's a partial operationalization: Timelines, takeoff speeds, technical AI alignment, and p(doom).

[ETA: lest people write me off as a Yudkowsky fanboy, I wish to emphasize that I too think people are overindexing on Yudkowsky's views, I too think there ar... (read more)

Habryka

Jun 20 2022

Didn't you post that comment right here?

kokotajlod

Jun 20 2022

Oops! Dunno what happened, I thought it was not yet posted. (I thought I had posted it at first, but then I looked for it and didn't see it & instead saw the unposted draft, but while I was looking for it I saw Richard's post... I guess it must have been some sort of issue with having multiple tabs open. I'll delete the other version.)

Dawn Drescher

Jun 21 2022

I agree, and I’m a bit confused that the top-level post does not violate forum rules in its current form. There is a version of the post – rephrased and reframed – that I think would be perfectly fine even though I would still disagree with it. And I say that as someone who loved Paul’s response to Eliezer’s list! Separately, my takeaway from Ben’s 80k interview has been that I think that Eliezer’s take on AI risk is much more truth-tracking than Ben’s. To improve my understanding, I would turn to Paul and ARC’s writings rather than Eliezer and MIRI’s, but Eliezer’s takes are still up there among the most plausible ones in my mind. I suspect that the motivation for this post comes from a place that I would find epistemically untenable and that bears little semblance to the sophisticated disagreement between Eliezer and Paul. But I’m worried that a reader may come away with the impression that Ben and Paul fall into one camp and Eliezer into another on AI risk when really Paul agrees with Eliezer on many points when it comes to the importance and urgency of AI safety (see the list of agreements at the top of Paul’s post).

Stefan_SchubertJun 21 202255

I agree, and I’m a bit confused that the top-level post does not violate forum rules in its current form.

That seems like a considerable overstatement to me. I think it would be bad if the forum rules said an article like this couldn't be posted.

Dawn Drescher

Jun 21 2022

Maybe, but I find it important to maintain the sort of culture where one can be confidently wrong about something without fear that it’ll cause people to interpret all future arguments only in light of that mistake instead of taking them at face value and evaluating them for their own merit. The sort of entrepreneurialness that I still feel is somewhat lacking in EA requires committing a lot of time to a speculative idea on the off-chance that it is correct. If it is not, the entrepreneur has wasted a lot of time and usually money. If additionally it has the social cost that they can't try again because people will dismiss them because of that past failure, it makes it just so much less likely still that anyone will try in the first place. Of course that’s not the status quo. I just really don’t want EA to move in that direction.

Stefan_SchubertJun 21 202217

If anything, I think that prohibiting posts like this from being published would have a more detrimental effect on community culture.

Of course, people are welcome to criticise Ben's post - which some in fact do. That's a very different category from prohibition.

Dawn DrescherJun 21 202211

Yeah, that sounds perfectly plausible to me.

“A bit confused” wasn’t meant to be any sort of rhetorical pretend understatement or something. I really just felt a slight surprise that caused me to check whether the forum rules contain something about ad hom, and found that they don’t. It may well be the right call on balance. I trust the forum team on that.

bgarfinkelJun 21 2022127

I really appreciate the time people have taken to engage with this post (and actually hope the attention cost hasn’t been too significant). I decided to write some post-discussion reflections on what I think this post got right and wrong.

The reflections became unreasonably long - and almost certainly should be edited down - but I’m posting them here in a hopefully skim-friendly format. They cover what I see as some mistakes with the post, first, and then cover some views I stand by.

Things I would do differently in a second version of the post:

1. I would either drop the overall claim about how much people should defer to Yudkowsky — or defend it more explicitly

At the start of the post, I highlight the two obvious reasons to give Yudkowsky's risk estimates a lot of weight: (a) he's probably thought more about the topic than anyone else and (b) he developed many of the initial AI risk arguments. I acknowledge that many people, justifiably, treat these as important factors when (explicitly or implicitly) deciding how much to defer to Yudkowsky.

Then the post gives some evidence that, at each stage of his career, Yudkowsky has made a dramatic, seemingly overconfident prediction about tec... (read more)

RobBensingerJun 23 202232

I noted some places I agree with your comment here, Ben. (Along with my overall take on the OP.)

Some additional thoughts:

Notably, since that post didn’t really have substantial arguments in it (although the later one did), I think the fact it had an impact is seemingly a testament to the power of deference

The “death with dignity” post came in the wake of Eliezer writing hundreds of thousands of words about why he thinks alignment is hard in the Late 2021 MIRI Conversations (in addition to the many specific views and arguments about alignment difficulty he’s written up in the preceding 15+ years). So it seems wrong to say that everyone was taking it seriously based on deference alone.

The post also has a lot of content beyond “p(doom) is high”. Indeed, I think the post’s focus (and value-add) is mostly in its discussion of rationalization, premature/excessive conditionalizing, and ethical injunctions, not in the bare assertion that p(doom) is high. Eliezer was already saying pretty similar stuff about p(doom) back in September.

I’d make it clearer that my main claim is: it would have been unreasonable to assign a very high credence to fast take-offs back in (e.g.) the early- or mid-20

... (read more)

Oliver SourbutJun 25 202223

I just wanted to state agreement that it seems a large number of people largely misread Death with Dignity, at least according to what seems to me the most plausible intended message: mainly about the ethical injunctions (which are very important as a finitely-rational and prone-to-rationalisation being), as Yudkowsky has written of in the past.

The additional detail of 'and by the way this is a bad situation and we are doing badly' is basically modal Yudkowsky schtick and I'm somewhat surprised it updated anyone's beliefs (about Yudkowsky's beliefs, and therefore their all-things-considered-including-deference beliefs).

I think if he had been a little more audience-aware he might have written it differently. Then again maybe not, if the net effect is more attention and investment in AI safety - and more recent posts and comments suggest he's more willing than before to use certain persuasive techniques to spur action (which seems potentially misguided to me, though understandable).

MichaelStJulesJun 23 202211

The “death with dignity” post came in the wake of Eliezer writing hundreds of thousands of words about why he thinks alignment is hard in the Late 2021 MIRI Conversations (in addition to the many specific views and arguments about alignment difficulty he’s written up in the preceding 15+ years). So it seems wrong to say that everyone was taking it seriously based on deference alone.

I think "deference alone" is a stronger claim than the one we should worry about. People might read the arguments on either side (or disproportionately Eliezer's arguments), but then defer largely to Eliezer's weighing of arguments because of his status/position, confidence, references to having complicated internal models (that he often doesn't explain or link explanations to), or emotive writing style.

What share of people with views similar to Eliezer's do you expect to have read these conversations? They're very long, not well organized, and have no summaries/takeaways. The format seems pretty bad if you value your time.

I think the AGI Ruin: A List of Lethalities post was formatted pretty accessibly, but that came after death with dignity.

Also, insofar as Paul recently argued for X and Eliezer respond

... (read more)

Owen Cotton-BarrattJun 21 202221

I really appreciated this update. Mostly it checks out to me, but I wanted to push back on this:

Here’s a dumb thought experiment: Suppose that Yudkowsky wrote all of the same things, but never published them. But suppose, also, that a freak magnetic storm ended up implanting all of the same ideas in his would-be-readers’ brains. Would this absence of a casual effect count against deferring to Yudkowsky? I don’t think so. The only thing that ultimately matters, I think, is his track record of beliefs - and the evidence we currently have about how accurate or justified those beliefs were.

It seems to me that a good part of the beliefs I care about assessing are the beliefs about what is important. When someone has a track record of doing things with big positive impact, that's some real evidence that they have truth-tracking beliefs about what's important. In the hypothetical where Yudkowsky never published his work, I don't get the update that he thought these were important things to publish, so he doesn't get credit for being right about that.

Yonatan Cale

Jun 22 2022

There's also (imperfect) information in "lots of smart people thought about EY's opinions and agree with him" that you don't get from the freak magnetic storm scenario.

HabrykaJun 23 202221

I appreciate this update!

Then the post gives some evidence that, at each stage of his career, Yudkowsky has made a dramatic, seemingly overconfident prediction about technological timelines and risks - and at least hasn’t obviously internalised lessons from these apparent mistakes.

I am confused about you bringing in the claim of "at each stage of his career", given that the only two examples you cited that seemed to provide much evidence here were from the same (and very early) stage of his career. Of course, you might have other points of evidence that point in this direction, but I did want to provide some additional pushback on the "at each stage of his career" point, which I think you didn't really provide evidence for.

I do think finding evidence for each stage of his career would of course be time-consuming, and I understand that you didn't really want to go through all of that, but it seemed good to point out explicitly.

Ultimately, I don’t buy the comparison. I think it’s really out-of-distribution for someone in their late teens and early twenties to pro-actively form the view that an emerging technology is likely to kill everyone within a decade, found an

... (read more)

richard_ngoJun 23 202216

Thanks for writing this update. I think my number one takeaway here is something like: when writing a piece with the aim of changing community dynamics, it's important to be very clear about motivations and context. E.g. I think a version of the piece which said "I think people are overreacting to Death with Dignity, here are my specific models of where Yudkowsky tends to be overconfident, here are the reasons why I think people aren't taking those into account as much as they should" would have been much more useful and much less controversial than the current piece, which (as I interpret it) essentially pushes a general "take Yudkowsky less seriously" meme (and is thereby intrinsically political/statusy).

Yonatan CaleJun 22 202210

I'm a bit confused about a specific small part:

tendency toward expressing dramatic views

I imagine that for many people, including me (including you?), once we work on [what we believe to be] preventing the world from ending, we would only move to another job if it was also preventing the world from ending, probably in an even more important way.

In other words, I think "working at a 2nd x-risk job and believing it is very important" is mainly predicted by "working at a 1st x-risk job and believing it is very important", much more than by personality traits.

This is almost testable, given we have lots of people working on x-risk today and believing it is very important. But maybe you can easily put your finger on what I'm missing?

9[anonymous]Jun 22 2022

For what it's worth, I found this post and the ensuing comments very illuminating. As a person relatively new to both EA and the arguments about AI risk, I was a little bit confused as to why there was not much push back on the very high confidence beliefs about AI doom within the next 10 years. My assumption had been that there was a lot of deference to EY because of reverence and fealty stemming from his role in getting the AI alignment field started not to mention the other ways he has shaped people's thinking. I also assumed that his track record on predictions was just ambiguous enough for people not to question his accuracy. Given that I don't give much credence to the idea that prophets/oracles exist, I thought it unlikely that the high confidence on his predictions were warranted on the count that there doesn't seem to be much evidence supporting the accuracy of long range forecasts. I did not think that there were such glaring mispredictions made by EY in the past so thank you for highlighting them.

Verden

Jun 22 2022

I feel like people are missing one fairly important consideration when discussing how much to defer to Yudkowsky, etc. Namely, I've heard multiple times that Nate Soares, the executive director of MIRI, has models of AI risk that are very similar to Yudkowsky's, and their p(doom) are also roughly the same. My limited impression is that Soares is no less smart or otherwise capable than Yudkowsky. So, when having this kind of discussion, focusing on Yudkowsky's track record or whatever, I think it's good to remember that there's another very smart person, who entered AI safety much later than Yudkowsky, and who holds very similar inside views on AI risk.

Gavin

Jun 22 2022

This isn't much independent evidence I think: seems unlikely that you could become director of MIRI unless you agreed. (I know that there's a lot of internal disagreement at other levels.)

Verden

Jun 22 2022

My point has little to do with him being the director of MIRI per se. I suppose I could be wrong about this, but my impression is that Nate Soares is among the top 10 most talented/insightful people with elaborate inside view and years of research experience in AI alignment. He also seems to agree with Yudkowsky on a whole lot of issues and predicts about the same p(doom) for about the same reasons. And I feel that many people don't give enough thought to the fact that while e.g. Paul Christiano has interacted a lot with Yudkowsky and disagreed with him on many key issues (while agreeing on many others), there's also Nate Soares, who broadly agrees with Yudkowsky's models that predict very high p(doom). Another, more minor point: if someone is bringing up Yudkowsky's track record in the context of his extreme views on AI risk, it seems helpful to talk about Soares' track record as well.

Guy Raveh

Jun 22 2022

I think this maybe argues against a point not made in the OP. Garfinkel isn't saying "disregard Yudkowsky's views" - rather he's saying "don't give them extra weight just because Yudkowsky's the one saying them". For example, from his reply to Richard Ngo: So at least from Garfinkel's perspective, Yudkowsky and Soares do count as data points, they're just equal in weight to other relevant data points. (I'm not expressing any of my own, mostly unformed, views here)

RobBensinger

Jun 23 2022

Ben has said this about Eliezer, but not about Nate, AFAIK.

Dr. David Mathers

Jun 22 2022

'Here’s one data point I can offer from my own life: Through a mixture of college classes and other reading, I’m pretty confident I had already encountered the heuristics and biases literature, Bayes’ theorem, Bayesian epistemology, the ethos of working to overcome bias, arguments for the many worlds interpretation, the expected utility framework, population ethics, and a number of other ‘rationalist-associated’ ideas before I engaged with the effective altruism or rationalist communities.' I think some of this is just a result of being a community founded partly by analytic philosophers. (though as a philosopher I would say that!). I think it's normal to encounter some of these ideas in undergrad philosophy programs. At my undergrad back in 2005-09 there was a whole upper-level undergraduate course in decision theory. I don't think that's true everywhere all the time, but I'd be surprised if it was wildly unusual. I can't remember if we covered population ethics in any class, but I do remember discovering Parfit on the Repugnant Conclusion in 2nd-year of undergrad because one of my ethics lecturers said Reasons and Persons was a super-important book. In terms of the Oxford phil scene where the term "effective altruism" was born, the main titled professorship in ethics at that time was held by John Broome, a utilitarianism-sympathetic former economist, who had written famous stuff on expected utility theory. I can't remember if he was the PhD supervisor of anyone important to the founding of EA, but I'd be astounded if some of the phil. people involved in that had not been reading his stuff and talking to him about it. Most of the phil. physics people at Oxford were gung-ho for many worlds, it's not a fringe view in philosophy of physics as far as I know. (Though I think Oxford was kind of a centre for it and there was more dissent elsewhere.) As far as I can tell, Bayesian epistemology in at least some senses of that term is a fairly well-known approach in phi

PabloJun 22 202216

the main titled professorship in ethics at that time was held by John Broome, a utilitarianism-sympathetic former economist, who had written famous stuff on expected utility theory. I can't remember if he was the PhD supervisor of anyone important to the founding of EA, but I'd be astounded if some of the phil. people involved in that had not been reading his stuff and talking to him about it.

Indeed, Broome co-supervised the doctoral theses of both Toby Ord and Will MacAskill. And Broome was, in fact, the person who advised Will to get in touch with Toby, before the two had met.

LinchJun 24 202215

Speaking for myself, I was interested in a lot of the same things in the LW cluster (Bayes, approaches to uncertainty, human biases, utilitarianism, philosophy, avoiding the news) before I came across LessWrong or EA. The feeling is much more like "I found people who can describe these ideas well" than "oh these are interesting and novel ideas to me." (I had the same realization when I learned about utilitarianism...much more of a feeling that "this is the articulation of clearly correct ideas, believing otherwise seems dumb").

That said, some of the ideas on LW that seemed more original to me (AI risk, logical decision theory stuff, heroic responsibility in an inadequate world), do seem both substantively true and extremely important, and it took me a lot of time to be convinced of this.

(There are also other ideas that I'm less sure about, like cryonics and MW).

Guy Raveh

Jun 22 2022

Veering entirely off-topic here, but how does the many worlds hypothesis tie in with all the rest of the rationality/EA stuff?

Yonatan Cale

Jun 22 2022

[replying only to you with no context] EY pointed out the many worlds hypothesis as a thing that even modern science, specifically physics (which is considered a very well functioning science, it's not like social psychology), is missing. And he used this as an example to get people to stop trusting authority, including modern science, which many people around him seem to trust. I think this is a reasonable reference.

Guy Raveh

Jun 22 2022

Can't say any of that makes sense to me. I have the feeling there's some context I'm totally missing (or he's just wrong about it). I may ask you about this in person at some point :)

1[anonymous]Jul 4 2022

Edit: I think this came off more negatively than I intended it to, particularly about Yudkowsky's understanding of physics. The main point I was trying to make is that Yudkowsky was overconfident, not that his underlying position was wrong. See the replies for more clarification. I think there's another relevant (and negative) data point when discussing Yudkowsky's track record: his argument and belief that the Many-Worlds Interpretation of quantum mechanics is the only viable interpretation of quantum mechanics, and anyone who doesn't agree is essentially a moron. Here's one 2008 link from the Sequences where he expresses this position[1]; there are probably many other places where he's said similar things. (To be clear, I don’t know if he still holds this belief, and if he doesn’t anymore, when and why he updated away from it.) Many Worlds is definitely a viable and even leading interpretation, and may well be correct. But Yudkowsky's confidence in Many Worlds, as well as his conviction that people who disagree with him are making elementary mistakes, is more than a little disproportionate, and may come partly from a lack of knowledge and expertise. The above is a paraphrase of Scott Aaronson, a credible authority on quantum mechanics who is sympathetic to both Yudkowsky and Many Worlds (bold added): While this isn't directly related to AI risk, I think it's relevant to Yudkowsky's track record as a public intellectual. 1. ^ He expresses this in the last six paragraphs of the post. I'm excerpting some of it (bold added, italics were present in the original):

Steven ByrnesJul 5 202216

OTOH, I am (or I guess was?) a professional physicist, and when I read Rationality A-Z, I found that Yudkowsky was always reaching exactly the same conclusions as me whenever he talked about physics, including areas where (IMO) the physics literature itself is a mess—not only interpretations of QM, but also how to think about entropy & the 2nd law of thermodynamics, and, umm, I thought there was a third thing too but I forget.

That increased my respect for him quite a bit.

And who the heck am I? Granted, I can’t out-credential Scott Aaronson in QM. But FWIW, hmm let’s see, I had the highest physics GPA in my Harvard undergrad class and got the highest preliminary-exam score in my UC Berkeley physics grad school class, and I’ve played a major role in designing I think 5 different atomic interferometers (including an atomic clock) for various different applications, and in particular I was always in charge of all the QM calculations related to estimating their performance, and also I once did a semester-long (unpublished) research project on quantum computing with superconducting qubits, and also I have made lots of neat wikipedia QM diagrams and explanations including a pedag... (read more)

2[anonymous]Jul 5 2022

I agree that: Yudkowsky has an impressive understanding of physics for a layman, in some situations his understanding is on par with or exceeds some experts, and he has written explanations of technical topics that even some experts like and find impressive. This includes not just you, but also e.g. Scott Aaronson, who praised his series on QM in the same answer I excerpted above, calling it entertaining, enjoyable, and getting the technical stuff mostly right. He also praised it for its conceptual goals. I don't believe this is faint praise, especially given stereotypes of amateurs writing about physics. This is a positive part of Yudkowsky's track record. I think my comment sounds more negative about Yudkowsky's QM sequence than it deserves, so thanks for pushing back on that. I'm not sure what you mean when you call yourself a pro-MWI extremist but in any case AFAIK there are physicists, including one or more prominent ones, who think MWI is really the only explanation that makes sense, although there are obviously degrees in how fervently one can hold this position and Yudkowsky seems at the extreme end of the scale in some of his writings. And he is far from the only one who thinks Copenhagen is ridiculous. These two parts of Yudkowsky's position on MWI are not without parallel within professional physicists, and the point about Copenhagen being ridiculous is probably a point in his favor from most views (e.g. Nobel laureate Murray Gell-Mann said that Neils Bohr brainwashed people into Copenhagen), let alone this community. Perhaps I should have clarified this in my comment, although I did say that MWI is a leading interpretation and may well be correct. The negative aspects I said in my comment were: 1. Yudkowsky's confidence in MWI is disproportionate 2. Yudkowsky's conviction that people who disagree with him are making elementary mistakes is disproportionate 3. These may come partly from a lack of knowledge or expertise Maybe (3) is a little unfair

Steven ByrnesJul 6 202215

Hmm, I’m a bit confused where you’re coming from.

Suppose that the majority of eminent mathematicians believe 5+5=10, but a significant minority believes 5+5=11. Also, out of the people in the 5+5=10 camp, some say “5+5=10 and anyone who says otherwise is just totally wrong”, whereas other people said “I happen to believe that the balance of evidence is that 5+5=10, but my esteemed colleagues are reasonable people and have come to a different conclusion, so we 5+5=10 advocates should approach the issue with appropriate humility, not overconfidence.”

In this case, the fact of the matter is that 5+5=10. So in terms of who gets the most credit added to their track-record, the ranking is:

1st place: The ones who say “5+5=10 and anyone who says otherwise is just totally wrong”,
2nd place: The ones who say “I think 5+5=10, but one should be humble, not overconfident”,
3rd place: The ones who say “I think 5+5=11, but one should be humble, not overconfident”,
Last place: The ones who say “5+5=11 and anyone who says otherwise is just totally wrong.

Agree so far?

(See also: Bayes’s theorem, Brier score, etc.)

Back to the issue here. Yudkowsky is claiming “MWI, and anyone who says otherwis... (read more)

RobBensinger

Jul 6 2022

'The more probability someone assigns to a claim, the more credit they get when the claim turns out to be true' is true as a matter of Bayesian math. And I agree with you that MWI is true, and that we have enough evidence to say it's true with very high confidence, if by 'MWI' we just mean a conjunction like "Objective collapse is false." and "Quantum non-realism is false / the entire complex amplitude is in some important sense real". (I think Eliezer had a conjunction like this in mind when he talked about 'MWI' in the Sequences; he wasn't claiming that decoherence explains the Born rule, and he certainly wasn't claiming that we need to reify 'worlds' as a fundamental thing. I think a better term for MWI might be the 'Much World Interpretation', since the basic point is about how much stuff there is, not about a division of that stuff into discrete 'worlds'.) That said, I have no objection in principle to someone saying 'Eliezer was right about MWI (and gets more points insofar as he was correct), but I also dock him more points than he gained because I think he was massively overconfident'. E.g., imagine someone who assigns probability 1 (or probability .999999999) to a coin flip coming up heads. If the coin then comes up heads, then I'm going to either assume they were trolling me, or I'm going to infer that they're very bad at reasoning. Even if they somehow rigged the coin, .999999999 is just too extreme a probability to be justified here. By the same logic, if Eliezer had said that MWI is true with probability 1, or if he'd put too many '9s' at the end of his .99... probability assignment, then I'd probably dock him more points than he gained for being object-level-correct. (Or I'd at least assume he has a terrible understanding of how Bayesian probability works. Someone could indeed be very miscalibrated and bad at talking in probabilistic terms, and yet be very knowledgeable and correct on object-level questions like MWI.) I'm not sure exactly how many

Steven Byrnes

Jul 6 2022

Fair enough, thanks.

6[anonymous]Jul 6 2022

Here's my point: There is a rational limit to the amount of confidence one can have in MWI (or any belief). I don't know where exactly this limit is for MWI-extremism but Yudkowsky clearly exceeded it sometimes. To use made up numbers, suppose: * MWI is objectively correct * Eliezer says P(MWI is correct) = 0.9999999 * But rationally one can only reach P(MWI) = 0.999 * Because there are remaining uncertainties that cannot be eliminated through superior thinking and careful consideration, such lack of experimental evidence, the possibility of QM getting overturned, the possibility of a new and better interpretation in the future, and unknown unknowns. * These factors add up to at least P(Not MWI) = 0.001. Then even though Eliezer is correct about MWI being correct, he is still significantly overconfident in his belief about it. Consider Paul's example of Eliezer saying MWI is comparable to heliocentrism: I agree with Paul here. Heliocentrism is vastly more likely than any particular interpretation of quantum mechanics, and Eliezer was wrong to have made this comparison. This may sound like I'm nitpicking, but I think it fits into a pattern of Eliezer making dramatic and overconfident pronouncements, and it's relevant information for people to consider e.g. when evaluating Eliezer's belief that p(doom) = ~1 and the AI safety situation is so hopeless that the only thing left is to die with slightly more dignity. Of course, it's far from the only relevant data point. Regarding (2), I think we're on the same page haha.

RobBensinger

Jul 6 2022

Could someone point to the actual quotes where Eliezer compares heliocentrism to MWI? I don't generally assume that when people are 'comparing' two very-high-probability things, they're saying they have the same probability. Among other things, I'd want confirmation that 'Eliezer and Paul assign roughly the same probability to MWI, but they have different probability thresholds for comparing things to heliocentrism' is false. E.g., if I compare Flat Earther beliefs, beliefs in psychic powers, belief 'AGI was secretly invented in the year 2000', geocentrism, homeopathy, and theism to each other, it doesn't follow that I'd assign the same probabilities to all of those six claims, or even probabilities that are within six orders of magnitude of each other. In some contexts it might indeed Griceanly imply that all six of those things pass my threshold for 'unlikely enough that I'm happy to call them all laughably silly views', but different people have their threshold for that kind of thing in different places.

Steven Byrnes

Jul 7 2022

Gotcha, thanks. I guess we have an object-level disagreement: I think that careful thought reveals MWI to be unambiguously correct, with enough 9’s as to justify Eliezer’s tone. And you don’t. ¯\_(ツ)_/¯ (Of course, this is bound to be a judgment call; e.g. Eliezer didn’t state how many 9’s of confidence he has. It’s not like there’s a universal convention for how many 9’s are enough 9’s to state something as a fact without hedging, or how many 9’s are enough 9’s to mock the people who disagree with you.)

4[anonymous]Jul 8 2022

Yes, agreed. Let me lay out my thinking in more detail. I mean this to explain my views in more detail, not as an attempt to persuade. Paul's account of Aaronson's view says that Eliezer shouldn't be as confident in MWI as he is, which in words sounds exactly like my point, and similar to Aaronson's stack exchange answer. But it still leaves open the question of how overconfident he was, and what, if anything, should be taken away from this. It's possible that there's a version of my point which is true but is also uninteresting or trivial (who cares if Yudkowsky was 10% too confident about MWI 15 years ago?). And it's worth reiterating that a lot of people give Eliezer credit for his writing on QM, including for being forceful in his views. I have no desire to argue against this. I had hoped to sidestep discussing this entirely since I consider it to be a separate point, but perhaps this was unfair and led to miscommunication. If someone wants to write a detailed comment/post explaining why Yudkowsky deserves a lot of credit for his QM writing, including credit for how forceful he was at times, I would be happy to read it and would likely upvote/strong upvote it depending on quality. However, here my intention was to focus on the overconfidence aspect. I'll explain what I see as the epistemic mistakes Eliezer likely made to end up in an overconfident state. Why do I think Eliezer was overconfident on MWI? (Some of the following may be wrong.) * He didn't understand non-MWI-extremist views, which should have rationally limited his confidence * I don't have sources for this, but I think something like this is true. * This was an avoidable mistake * Worth noting that Eliezer has updated towards the competence of elites in science since some of his early writing according to Rob's comment elsewhere this thread * It's possible that his technical understanding was uneven. This should also have limited his confidence. * Aaronson praised him fo

Steven Byrnes

Jul 10 2022

For what it's worth, consider the claim “The Judeo-Christian God, the one who listens to prayers and so on, doesn't exist.” I have such high confidence in this claim that I would absolutely state it as a fact without hedging, and psychoanalyze people for how they came to disagree with me. Yet there's a massive theology literature arguing to the contrary of that claim, including by some very smart and thoughtful people, and I've read essentially none of this theology literature, and if you asked me to do an anti-atheism ITT I would flunk it catastrophically. I'm not sure what lesson you'll take from that; for all I know you yourself are very religious, and this anecdote will convince you that I have terrible judgment. But if you happen to be on the same page as me, then maybe this would be an illustration of the fact that (I claim) one can rationally and correctly arrive at extremely-confident beliefs without it needing to pass through a deep understanding and engagement with the perspectives of the people who disagree with you. I agree that this isn’t too important a conversation, it’s just kinda interesting. :)

Paul_Christiano

Jul 9 2022

I'm not sure either of the quotes you cited by Eliezer require or suggest ridiculous overconfidence. If I've seen some photos of a tiger in town, and I know a bunch of people in town who got eaten by an animal, and we've all seen some apparent tiger-prints near where people got eaten, I may well say "it's obvious there is a tiger in town eating people." If people used to think it was a bear, but that belief was formed based on priors when we didn't yet have any hard evidence about the tiger, I may be frustrated with people who haven't yet updated. I may say "The only question is how quickly people's views shift from bear to tiger. Those who haven't already shifted seem like they are systematically slow on the draw and we should learn from their mistakes." I don't think any of those statements imply I think there's a 99.9% chance that it's a tiger. It's more a statement rejecting the reasons why people think there is a bear, and disagreeing with those reasons, and expecting their views to predictably change over time. But I could say all that while still acknowledging some chance that the tiger is a hoax, that there is a new species of animal that's kind of like a tiger, that the animal we saw in photos is different from the one that's eating people, or whatever else. The exact smallness of the probability of "actually it wasn't the tiger after all" is not central to my claim that it's obvious or that people will come around. I don't think it's central to this point, but I think 99% is a defensible estimate for many-worlds. I would probably go somewhat lower but certainly wouldn't run victory laps about that or treat it as damning of someone's character. The above is mostly a bad analogy explaining why I think it's pretty reasonable to say things like Eliezer did even if your all-things-considered confidence was 99% or even lower. To get a sense for what Eliezer finds frustrating and intends to critique, you can read If many-worlds had come first (which I find qui

Paul_Christiano

Jul 5 2022

This doesn't feel like a track record claim to me. Nothing has changed since Eliezer wrote that; it reads as reasonably now as it did then; and we have nothing objective against which to evaluate it. I broadly agree with Eliezer that (i) collapse seems unlikely, (ii) if the world is governed by QM as we understand it, the whole state is probably as "real" as we are, (iii) there seems to be nothing to favor the alternative interpretations other than those that make fewer claims and are therefore more robust to unknown-unknowns. So if anything I'd be inclined to give him a bit of credit on this one, given that it seems to have held up fine for readers who know much more about quantum mechanics than he did when writing the sequence. The main way the sequence felt misleading was by moderately overstating how contrarian this take was. For example, near the end of my PhD I was talking with Scott Aaronson and my advisor Umesh Vazirani, who I considered not-very-sympathetic to many worlds. When asked why, my recollection of his objection was "What are these 'worlds' that people are talking about? There's just the state." That is, the whole issue turned on a (reasonable) semantic objection. However, I do think Eliezer is right that in some parts of physics collapse is still taken very seriously and there are more-than-semantic disagreements. For example, I was pretty surprised by David Griffiths' discussion of collapse in the afterword of his textbook (pdf) during undergrad. I think that Eliezer is probably right that some of these are coming from a pretty confused place. I think the actual situation with respect to consensus is a bit muddled, and e.g. I would be fairly surprised if Eliezer was able to make a better prediction about the result of any possible experiment than the physics community based on his confidence in many-worlds. But I also think that a naive-Paul perspective of "no way anyone is as confused as Eliezer is saying" would have been equally-unreasonabl

3[anonymous]Jul 5 2022

When I said it was relevant to his track record as a public intellectual, I was referring to his tendency to make dramatic and overconfident pronouncements (which Ben mentioned in the parent comment). I wasn't intending to imply that the debate around QM had been settled or that new information had come out. I do think that even at the time Eliezer's positions on both MWI and why people disagreed with him on it were overconfident though. I think you're right that my comment gave too little credit to Eliezer, and possibly misleadingly implied that Eliezer is the only one who holds some kind of extreme MWI or anti-collapse view or that such views are not or cannot be reasonable (especially anti-collapse). I said that MWI is a leading candidate but that's still probably underselling how many super pro-MWI positions there are. I expanded on this in another comment. Your story of Eliezer comparing MWI to heliocentrism is a central example of what I'm talking about. It is not that his underlying position is wrong or even unlikely, but that he is significantly overconfident. I think this is relevant information for people trying to understand Eliezer's recent writings. To be clear, I don't think it's a particularly important example, and there is a lot of other more important information than whether Eliezer overestimated the case for MWI to some degree while also displaying impressive understanding of physics and possibly/probably being right about MWI.

HabrykaJun 19 202289

It seems that half of these examples are from 15+ years ago, from a period for which Eliezer has explicitly disavowed his opinions (and the ones that are not strike me as most likely correct, like treating coherence arguments as forceful and that AI progress is likely to be discontinuous and localized and to require relatively little compute).

Let's go example-by-example:

1. Predicting near-term extinction from nanotech

This critique strikes me as about as sensible as digging up someone's old high-school essays and critiquing their stance on communism or the criminal justice system. I want to remind any reader that this is an opinion from 1999, when Eliezer was barely 20 years old. I am confident I can find crazier and worse opinions for every single leadership figure in Effective Altruism, if I am willing to go back to what they thought while they were in high-school. To give some character, here are some things I believed in my early high-school years:

The economy was going to collapse because the U.S. was establishing a global surveillance state
Nuclear power plants are extremely dangerous and any one of them is quite likely to explode in a given year
We could have e

... (read more)

PabloJun 19 202285

It seems that half of these examples are from 15+ years ago, from a period for which Eliezer has explicitly disavowed his opinions

Just to note that the boldfaced part has no relevance in this context. The post is not attributing these views to present-day Yudkowsky. Rather, it is arguing that Yudkowsky's track record is less flattering than some people appear to believe. You can disavow an opinion that you once held, but this disavowal doesn't erase a bad prediction from your track record.

HabrykaJun 19 202211

Hmm, I think that part definitely has relevance. Clearly we would trust Eliezer less if his response to that past writing was "I just got unlucky in my prediction, I still endorse the epistemological principles that gave rise to this prediction, and would make the same prediction, given the same evidence, today".

If someone visibly learns from forecasting mistakes they make, that should clearly update us positively on them not repeating the same mistakes.

bgarfinkelJun 19 202221

If someone visibly learns from forecasting mistakes they make, that should clearly update us positively on them not repeating the same mistakes.

I suppose one of my main questions is whether he has visibly learned from the mistakes, in this case.

For example, I wasn't able to find a post or comment to the effect of "When I was younger, I spent of years of my life motivated by the belief that near-term extinction from nanotech was looming. I turned out to be wrong. Here's what I learned from that experience and how I've applied it to my forecasts of near-term existential risk from AI." Or a post or comment acknowledging his previous over-optimistic AI timelines and what he learned from them, when formulating his current seemingly short AI timelines.

(I genuinely could be missing these, since he has so much public writing.)

HabrykaJun 20 202211

Eliezer writes a bit about his early AI timeline and nanotechnology opinions here, though it sure is a somewhat obscure reference that takes a bunch of context to parse:

Luke Muehlhauser reading a previous draft of this (only sounding much more serious than this, because Luke Muehlhauser): You know, there was this certain teenaged futurist who made some of his own predictions about AI timelines -

Eliezer: I'd really rather not argue from that as a case in point. I dislike people who screw up something themselves, and then argue like nobody else could possibly be more competent than they were. I dislike even more people who change their mind about something when they turn 22, and then, for the rest of their lives, go around acting like they are now Very Mature Serious Adults who believe the thing that a Very Mature Serious Adult believes, so if you disagree with them about that thing they started believing at age 22, you must just need to wait to grow out of your extended childhood.
Luke Muehlhauser (still being paraphrased): It seems like it ought to be acknowledged somehow.
Eliezer: That's fair, yeah, I can see how someone might think it was

... (read more)

Guy RavehJun 20 202221

How would the forerunners of effective altruism in 1999 know about putting probability distributions on forecasts? I haven't told them to do that yet!

Did Yudkowsky actually write these sentences?

If Yudkowsky thinks, as this suggests, that people in EA think or do things because he tells them to - this alone means it's valuable to question whether people give him the right credibility.

Habryka

Jun 20 2022

I am not sure about the question. Yeah, this is a quote from the linked post, so he wrote those sections. Also, yeah, seems like Eliezer has had a very large effect on whether this community uses things like probability distributions, models things in a bayesian way, makes lots of bets, and pays attention to things like forecasting track records. I don't think he gets to take full credit for those norms, but my guess is he is the single individual who most gets to take credit for those norms.

Guy RavehJun 20 202224

I am not sure about the question.

I wanted to make sure I'm not missing something, since this shines a negative light about him IMO.

There's a difference between saying, for example, "You can't expect me to have done X then - nobody was doing it, and I haven't even written about it yet, nor was I aware of anyone else doing so" - and saying "... nobody was doing it because I haven't told them to."

This isn't about credit. It's about self-perception and social dynamics.

-9

Habryka

Jun 20 2022

John G. HalsteadJun 21 202224

I don't see how he has encouraged people to pay attention to forecasting track records. People who have encouraged that norm make public bets or go on public forecasting platforms and make predictions about questions that can resolve in the short term. Bryan Caplan does this; I think greg Lewis and David Manheim are superforecasters.

I thought the upshot of this piece and the Jotto post was that Yudkowsky is in fact very dismissive of people who make public forecasts. "I consider naming particular years to be a cognitively harmful sort of activity; I have refrained from trying to translate my brain's native intuitions about this into probabilities, for fear that my verbalized probabilities will be stupider than my intuitions if I try to put weight on them." This seems like the opposite of encouraging people to pay attention to forecasting but is rather dismissing the whole enterprise of forecasting.

HaydnBelfieldJun 20 202211

More than Philip Tetlock (author of Superforecasting)?

Does that particular quote from Yudkowsky not strike you as slightly arrogant?

-5

Habryka

Jun 20 2022

David JohnstonJun 20 202219

FWIW I think "it was 20 years ago" is a good reason not to take these failed predictions too seriously, and "he has disavowed these predictions after seeing they were false" is a bad reason to take them unseriously.

bgarfinkelJun 19 202242

On 1 (the nanotech case):

I want to remind any reader that this is an opinion from 1999, when Eliezer was barely 20 years old.

I think your comment might give the misimpression that I don't discuss this fact in the post or explain why I include the case. What I write is:

I should, once again, emphasize that Yudkowsky was around twenty when he did the final updates on this essay. In that sense, it might be unfair to bring this very old example up.
Nonetheless, I do think this case can be treated as informative, since: the belief was so analogous to his current belief about AI (a high outlier credence in near-term doom from an emerging technology), since he had thought a lot about the subject and was already highly engaged in the relevant intellectual community, since it's not clear when he dropped the belief, and since twenty isn't (in my view) actually all that young. I do know a lot of people in their early twenties; I think their current work and styles of thought are likely to be predictive of their work and styles of thought in the future, even though I do of course expect the quality to go up over time....

An addition reason why I think it's worth distinguishing between his... (read more)

HabrykaJun 20 202211

One quick response, since it was easy (might respond more later):

Overall, then, I do think it's fair to consider a fast-takeoff to be a core premise of the classic arguments. It wasn't incidental or a secondary consideration.

I do think takeoff speeds between 1 week and 10 years are a core premise of the classic arguments. I do think the situation looks very different if we spend 5+ years in the human domain, but I don't think there are many who believe that that is going to happen.

I don't think the distinction between 1 week and 1 year is that relevant to the core argument for AI Risk, since it seems in either case more than enough cause for likely doom, and that premise seems very likely to be true to me. I do think Eliezer believes things more on the order of 1 week than 1 year, but I don't think the basic argument structure is that different in either case (though I do agree that the 1 year opens us up to some more potential mitigating strategies).

Jan_KulveitJun 20 202234

(i.e. most people who are likely to update downwards on Yudkowsky on the basis of this post, seem to me to be generically too trusting, and I am confident I can write a more compelling post about any other central figure in Effective Altruism that would likely cause you to update downwards even more)

My impression is the post is somewhat unfortunate attempt to "patch" the situation in which many generically too trusting people updated a lot on AGI Ruin: A List of Lethalities and Death with Dignity and subsequent deference/update cascades.

In my view the deeper problem here is instead of disagreements about model internals, many of these people do some sort of "averaging conclusions" move, based on signals like seniority, karma, vibes, etc.

Many of these signals are currently wildly off from truth-tracking, so you get attempts to push the conclusion-updates directly.

LinchJun 20 202230

This critique strikes me as about as sensible as digging up someone's old high-school essays and critiquing their stance on communism or the criminal justice system. I want to remind any reader that this is an opinion from 1999, when Eliezer was barely 20 years old. I am confident I can find crazier and worse opinions for every single leadership figure in Effective Altruism, if I am willing to go back to what they thought while they were in high-school. To give some character, here are some things I believed in my early high-school years

This is really minor and nitpicky, and I agree with much of your overall points, but I don't think equivocating between "barely 20" and "early high-school" is fair. The former is a normal age to be a third-year university student in the US, and plenty of college-age EAs are taken quite seriously by the rest of us.

HabrykaJun 21 202212

Oh, hmm, I think this is just me messing up the differences between the U.S. and german education systems (I was 18 and 19 in high-school, and enrolled in college when I was 20).

I think the first quote on nanotechnology was actually written in 1996 originally (though was maybe updated in 1999). Which would put Eliezer at ~17 years old when he wrote that.

The second quote was I think written in more like 2000, which would put him more in the early college years, and I agree that it seems good to clarify that.

Linch

Jun 21 2022

Thank you, this clarification makes sense to me!

Paul_ChristianoJul 5 202217

e.g. Paul Christiano has also said that Hanson's predictions looked particularly bad in the FOOM debate

To clarify, what I said was:

I don't think Eliezer has an unambiguous upper hand in the FOOM debate at all

Then I listed a bunch of ways in which the world looks more like Robin's predictions, particularly regarding continuity and locality. I said Robin's predictions about AI timelines in particular looked bad. This isn't closely related to the topic of your section 3, where I mostly agree with the OP.

Habryka

Jul 6 2022

Hmm, I think this is fair, rereading that comment. I feel a bit confused here, since at the scale that Robin is talking about, timelines and takeoff speeds seem very inherently intertwined (like, if Robin predicts really long timelines, this clearly implies a much slower takeoff speed, especially when combined with gradual continuous increases). I agree there is a separate competitiveness dimension that you and Robin are closer on, which is important for some of the takeoff dynamics, but on overall takeoff speed, I feel like you are closer to Eliezer than Robin (Eliezer predicting weeks to months to cross the general intelligence human->superhuman gap, you predicting single-digit years to cross that gap, and Hanson predicting decades to cross that gap). Though it's plausible that I am missing something here. In any case, I agree that my summary of your position here is misleading, and will edit accordingly.

Paul_Christiano

Jul 6 2022

I think my views about takeoff speeds are generally similar to Robin's though neither Robin nor Eliezer got at all concrete in that discussion so I can't really say. You can read this essay from 1998 with his "outside-view" guesses, which I suspect are roughly in line with what he's imagining in the FOOM debate. I think that doc implies significant probability on a "slow" takeoff of 8, 4, 2... year doublings (more like the industrial revolution), but a broad distribution over dynamics which also puts significant probability on e.g. a relatively fast jump to a 1 month doubling time (more like the agricultural revolution). In either case, over the next few doublings he would by default expect still further acceleration. Overall I think this is basically a sensible model. (I agree that shorter timelines generally suggest faster takeoff, but I think either Robin or Eliezer's views about timelines would be consistent with either Robin or Eliezer's views about takeoff speed.)

Guy Raveh

Jun 20 2022

If done in a polite and respectful manner, I think this would be a genuinely good idea.

gwernJun 19 202273

Not sure why this is on EAF rather than LW or maybe AF, but anyway. I find this interesting to look at because I have been following Eliezer's work since approximately 2003 on SL4, and so I remember this firsthand, as it were. I disagree with several of the evaluations here (but of course agree with several of the others - I found the premise of Flare to be ludicrous at the time, and thankfully, AFAICT, pretty much zero effort went into that vaporware*):

calling LOGI and related articles 'wrong' because that's not how DL looks right now is itself wrong. Yudkowsky has never said that DL or evolutionary approaches couldn't work, or that all future AI work would look like the Bayesian program and logical approach he favored; he's said (consistently since at least SL4 that I've observed) that they would be extremely dangerous when they worked, and extremely hard to make safe to the high probability that we need them to when deployed to the real world indefinitely and unboundedly and self-modifyingly, and that rigorous program-proof approaches which can make formal logical guarantees of 100% safety are what are necessary and must deal with the issues and concepts discussed in LOGI. I

... (read more)

RyanCareyJun 19 2022131

like Bostrom's influential Superintelligence - Eliezer with the serial numbers filed off and an Oxford logo added

It's not accurate that the key ideas of Superintelligence came to Bostrom from Eliezer, who originated them. Rather, at least some of the main ideas came to Eliezer from Nick. For instance, in one message from Nick to Eliezer on the Extropians mailing list, dated to Dec 6th 1998, inline quotations show Eliezer arguing that it would be good to allow a superintelligent AI system to choose own its morality. Nick responds that it's possible for an AI system to be highly intelligent without being motivated to act morally. In other words, Nick explains to Eliezer an early version of the orthogonality thesis.

Nick was not lagging behind Eliezer on evaluating the ideal timing of a singularity, either - the same thread reveals that they both had some grasp of the issue. Nick said that the fact that 150,000 people die per day must be contextualised against "the total number of sentiences that have died or may come to live", foreshadowing his piece on Astronomical Waste, that would be published five years later. Eliezer said that having waited billions of years, the probability of a... (read more)

Ben PaceJun 19 202219

I think chapter 4, The Kinetics of an Intelligence Explosion, has a lot of terms and arguments from EY's posts in the FOOM Debate. (I've been surprised by this in the past, thinking Bostrom invented the terms, then finding things like resource overhangs getting explicitly defined in the FOOM Debate.)

bgarfinkelJun 19 202229

Thanks for the comment! A lot of this is useful.

calling LOGI and related articles 'wrong' because that's not how DL looks right now is itself wrong. Yudkowsky has never said that DL or evolutionary approaches couldn't work, or that all future AI work would look like the Bayesian program and logical approach he favored;

I mainly have the impression that LOGI and related articles were probably "wrong" because, so far as I've seen, nothing significant has been built on top of them in the intervening decade-and-half (even though LOGI's successor was seemingly predicted to make it possible for a small group to build AGI). It doesn't seem like there's any sign that these articles were the start of a promising path to AGI that was simply slower than the deep learning path.

I have had the impression, though, that Yudkowsky also thought that logical/Bayesian approaches were in general more powerful/likely-to-enable-near-term-AGI (not just less safe) than DL. It's totally possible this is a misimpression - and I'd be inclined to trust your impression over mine, since you've read more of his old writing than I have. (I'd also be interested if you happen to have any links handy.) But I'm not... (read more)

DirectedEvolution

Jun 20 2022

I'm going to break a sentence from your comment here into bits for inspection. Also, emphasis and elisions mine. We don't have a formalism to describe what "agency" is. We do have several posts trying to define it on the Alignment Forum: * Gradations of Agency * Optimality is the tiger, and agents are its teeth * Agency and Coherence While it might not be the best choice, I'm going to use Gradations of Agency as a definition, because it's more systematic in its presentation. "Level 3" is described as "Armed with this ability you can learn not just from your own experience, but from the experience of others—you can identify successful others and imitate them." This doesn't seem like what any ML model does. So we can look at "Level 2," which gives the example " You start off reacting randomly to inputs, but you learn to run from red things and towards green things because when you ran towards red things you got negative reward and when you ran towards green things you got positive reward." This seems like how all ML works. So using the "Gradations of Agency" framework, we might view individual ML systems as improving in power and generality within a single level of agency. But they don't appear to be changing levels of agency. They aren't identifying other successful ML models and imitating them. Gradations of Agency doesn't argue whether or not there is an asymptote of power and generality within each level. Is there a limit to the power and generality possible within level 2, where all ML seems to reside? This seems to be the crux of the issue. If DL is approaching an asymptote of power and generality below that of AGI as model and data sizes increase, then this cuts directly against Yudkowsky's predictions. On the other hand, if we think that DL can scale to AGI through model and data size increases alone, then that would be right in line with his predictions. A 10 trillion parameter model now exists, and it's been suggested that a 100 trillion parame

kokotajlod

Jun 20 2022

Re gradations of agency: Level 3 and level 4 seem within reach IMO. IIRC there are already some examples of neural nets being trained to watch other actors in some simulated environment and then imitate them. Also, model-based planning (i.e. level 4) is very much a thing, albeit something that human programmers seem to have to hard-code. I predict that within 5 years there will be systems which are unambiguously in level 3 and level 4, even if they aren't perfect at it (hey, we humans aren't perfect at it either).

Charles He

Jun 20 2022

This sounds like straightforward transfer learning (TL) or fine tuning, common in 2017. So you could just write 15 lines of python which shops between some set of pretrained weights and sees how they perform. Often TL is many times (1000x) faster than random weights and only needs a few examples. As speculation: it seems like in one of the agent simulations you can just have agents grab other agents weights or layers and try them out in a strategic way (when they detect an impasse or new environment or something). There is an analogy to biology where species alternate between asexual vs sexual reproduction, and trading of genetic material occurs during periods of adversity. (This is trivial, I’m sure a second year student has written a lot more.) This doesn’t seem to fit any sort of agent framework or improve agency though. It just makes you train faster.

Charles He

Jun 20 2022

Eh, there seems like a connection to interpretability. For example, if the ML architecture “were modular+categorized or legible to the agents”, they would more quickly and effectively swap weights or models. So there might be some way where legibility can emerge by selection pressure in an environment where say, agents had limited capacity to store weights or data, and had to constantly and extensively share weights with each other. You could imagine teams of agents surviving and proliferating by a shared architecture that let them pass this data fluently in the form of weights. To make sure the transmission mechanism itself isn’t crazy baroque you can, like, use some sort of regularization or something. I’m 90% sure this is a shower thought but like it can’t be worse than “The Great Reflection”.

Locke

Aug 10 2022

n00b q: What's AF?

Linch

Aug 10 2022

Alignment Forum (for technical discussions about AI alignment)

Evan R. Murphy

Aug 10 2022

It's short for the Alignment Forum: https://www.alignmentforum.org/

2[anonymous]Jun 23 2022

One obvious answer is LW community and mods tend to defer to yudkowksy more than EAF connunity. (This doesn't argue whether the deferrence is good or bad, but this difference is a fact about reality I think)

-31

Charles He

Jun 19 2022

bgarfinkelJun 20 202252

A general reflection: I wonder if one at least minor contributing factor to disagreement, around whether this post is worthwhile, is different understandings about who the relevant audience is.

I mostly have in mind people who have read and engaged a little bit with AI risk debates, but not yet in a very deep way, and would overall be disinclined to form strong independent views on the basis of (e.g.) simply reading Yudkowsky's and Christiano's most recent posts. I think the info I've included in this post could be pretty relevant to these people, since in practice they're often going to rely a lot -- consciously or unconsciously; directly or indirectly -- on cues about how much weight to give different prominent figures' views. I also think that the majority of members of the existential risk community are in this reference class.

I think the info in this post isn't nearly as relevant to people who've consumed and reflected on the relevant debates very deeply. The more you've engaged with and reflected on an issue, the less you should be inclined to defer -- and therefore the less relevant track records become.

(The limited target audience might be something I don't do a good enough job communicating in the post.)

kokotajlodJun 20 202223

I think that insofar as people are deferring on matters of AGI risk etc., Yudkowsky is in the top 10 people in the world to defer to based on his track record, and arguably top 1. Nobody who has been talking about these topics for 20+ years has a similarly good track record. If you restrict attention to the last 10 years, then Bostrom does and Carl Shulman and maybe some other people too (Gwern?), and if you restrict attention to the last 5 years then arguably about a dozen people have a somewhat better track record than him.

(To my knowledge. I think I'm probably missing a handful of people who I don't know as much about because their writings aren't as prominent in the stuff I've read, sorry!)

He's like Szilard. Szilard wasn't right about everything (e.g. he predicted there would be a war and the Nazis would win) but he was right about a bunch of things including that there would be a bomb, that this put all of humanity in danger, etc. and importantly he was the first to do so by several years.

I think if I were to write a post cautioning people against deferring to Yudkowsky, I wouldn't talk about his excellent track record but rather about his arrogance, inability to clearly... (read more)

Rohin ShahJun 19 202237

See Rohin Shah’s (I think correct) objection to the use of “coherence arguments” to support AI risk concerns.

Fwiw I'd say this somewhat differently.

I object to a specific way in which one could use coherence arguments to support AI risk: namely, "AI is intelligent --> AI satisfies coherence arguments better than we do --> AI looks as though it is maximizing a utility function from our perspective --> Convergent instrumental subgoals --> Doom".

As far as I know, anyone who has spent ~an hour reading my post and thinking about it basically agrees with that particular narrow point.

This doesn't rule out other ways that one could use coherence arguments to support AI risk, such as "coherence arguments show that achieving stuff can typically be factored into beliefs about the world and goals that you want to achieve; since we'll be building AIs to achieve stuff, it seems likely they'll work by having separated beliefs and goals; if they have bad goals, then we die because of convergent instrumental subgoals". I'm more sympathetic to this argument (though not nearly as much as Eliezer appears to be).

I agree that the intro talk that you link to would likely cause people to think... (read more)

Dr. David MathersJun 21 202232

Several thoughts:

I'm not sure I can argue for this, but it feels weird and off-putting to me that all this energy is being spent discussing how good a track-record one guy has, especially one guy with a very charismatic and assertive writing-style, and a history of attempting to provide very general guidance for how to think across all topics (though I guess any philosophical theory of rationality does the last thing.) It just feels like a bad sign to me, though that could just be for dubious social reasons.
The question of how much to defer to E.Y. isn't answered just by things like "he has possibly the best track record in the world on this issue." If he's out of step with other experts, and by a long way, we need to have reason to think he outperforms the aggregate of experts before we weight him more than the aggregate and it's entirely normal, I'd have thought, for the aggregate to significantly outperform the single best individual. (I'm not making as strong a claim as that the best individual outperforming the aggregate is super-unusual and unlikely.) Of course if you think he's nearly as good as the aggregate, then you should still move a decent amount in his directi

... (read more)

-29

Charles He

Jun 21 2022

splinterJun 26 202232

The negative reactions to this post are disheartening. I have a degree of affectionate fondness for the parodic levels of overthinking that characterize the EA community, but here you really see the downsides of that overthinking concretely.

Of course it is meaningful that Eliezer Yudkowsky has made a bunch of terrible predictions in the past that closely echo predictions he continues to make in slightly different form today. Of course it is relevant that he has neither owned up to those earlier terrible predictions or explained how he has learned from those mistakes. Of course we should be more skeptical of similar claims he makes in the future. Of course we should pay more attention to broader consensus or aggregate predictions in the field than in outlier predictions.

This is sensible advice in any complex domain, and saying that we should "evaluate every argument in isolation on its merits" is a type of special pleading or sophistry. Sometimes (often!) the obvious conclusions are the correct ones: even extraordinarily clever people are often wrong; extreme claims that other knowledgeable experts disagree with are often wrong; and people who make extreme claims that prove to... (read more)

RobBensingerJun 28 202213

Of course it is meaningful that Eliezer Yudkowsky has made a bunch of terrible predictions in the past that closely echo predictions he continues to make in slightly different form today.

I assume you're mainly talking about young-Eliezer worrying about near-term risk from molecular nanotechnology, and current-Eliezer worrying about near-term risk from AGI?

I think age-17 Eliezer was correct to think widespread access to nanotech would be extremely dangerous. See my comment. If you or Ben disagree, why do you disagree?

Age-20 Eliezer was obviously wrong about the timing for nanotech, and this is obviously Bayesian evidence for 'Eliezer may have overly-aggressive tech timelines in general'.

I don't think this is generally true -- e.g., if you took a survey of EAs worried about AI risk in 2010 or in 2014, I suspect Eliezer would have longer AI timelines than others at the time. (E.g., he expected it to take longer to solve Go than Carl Shulman did.) When I joined MIRI, the standard way we summarized MIRI's view was roughly 'We think AI risk is high, but not because we think AGI is imminent; rather, our worry is that alignment is likely to take a long time, and that civilization may need ... (read more)

Guy RavehJun 20 202232

I think the effect should depend on your existing view. If you've always engaged directly with Yudkowsky's arguments and chose the ones convinced you, there's nothing to learn. If you thought he was a unique genius and always assumed you weren't convinced of things because he understood things you didn't know about, and believed him anyway, maybe it's time to dial it back. If you'd always assumed he's wrong about literally everything, it should be telling for you that OP had to go 15 years back for good examples.

Writing this comment actually helped me understand how to respond to the OP myself.

Dr. David Mathers

Jun 21 2022

'If you'd always assumed he's wrong about literally everything, it should be telling for you that OP had to go 15 years back to get good examples.' How strong evidence this is also depends on whether he has made many resolvable predictions since 15-years ago, right? If he hasn't it's not very telling. To be clear, I genuinely don't know if he has or hasn't.

Guy Raveh

Jun 21 2022

Sounds reasonable. Though predictions aren't the only thing one can be demonstratably wrong about.

Guy RavehJun 20 202227

Some off-topic comments, not specific to you or Yudkowsky:

the belief was so analogous to his current belief about AI... since he had thought a lot about the subject and was already highly engaged in the relevant intellectual community

It seems to me (but I could be mistaken) like I see the phrase "has thought a lot about X" fairly often in EA contexts, where it is taken to imply being very well-informed about X. I don't think this is good reasoning. Thinking about something is probably required for understanding it well, but is certainly not enough.
When an idea or theory is very fringe, there's a strong selection effect for people in the relevant intellectual community. This means even their average views are sometimes not good evidence for something. For example, to answer a question about the probability of doom from AI in this century, are alignment researchers a good reference class? They all naturally believe AI is an existential risk to begin with. I'm not sure I have the solution, since "AI researchers in general" isn't a good reference class either - many might have not given any thought to whether AI is dangerous.

2[anonymous]Jun 21 2022

Strong +1 on this. It in fact seems like the more someone thinks about something and takes a public position on it with strong confidence the more incentive they have to stick to the position they have. It's why making explicit forecasts and creating a forecasting track record is so important in countering this tendency. If arguments cannot be resolved by events happening in the real world then there is not much incentive for one to change their mind especially if it's about something speculative and abstract that one can generate arguments for ad infinitum by engaging in more speculation. On your example. The question of AI existential risk this century seems downstream to the question of the probability of AGI this century and one can find some potential reference classes for that: AI safety research, general AI research, computer science research, scientific research, technological innovation etc. None of these are perfect reference classes but are at least something to work with. Contingent on AGI being possible this century one can form an opinion on how low/high the probability of doom be to warrant concern.

iporphyryJun 19 202227

I like that you admit that your examples are cherry-picked. But I'm actually curious what a non-cherry-picked track record would show. Can people point to Yudkowsky's successes? What did he predict better than other people? What project did MIRI generate that either solved clearly interesting technical problems or got significant publicity in academic/AI circles outside of rationalism/EA? Maybe instead of a comment here this should be a short-form question on the forum.

Matthew_BarnettJun 19 202231

I like that you admit that your examples are cherry-picked. But I'm actually curious what a non-cherry-picked track record would show. Can people point to Yudkowsky's successes?

While he's not single-handedly responsible, he lead the movement to take AI risk seriously at a time when approximately no one was talking about it, which has now attracted the interests of top academics. This isn't a complete track record, but it's still a very important data-point. It's a bit like if he were the first person to say that we should take nuclear war seriously, and then five years later people are starting to build nuclear bombs and academics realize that nuclear war is very plausible.

bgarfinkel

Jun 19 2022

I definitely do agree with that! It's possible I should have emphasized the significance of it more in the post, rather than moving on after just a quick mention at the top. If it's of interest: I say a little more about how I think about this, in response to Gwern's comment below. (To avoid thread-duplicating, people might want to respond there rather than here if they have follow-on thoughts on this point.) My further comment is:

RobBensingerJun 23 202223

I work at MIRI, but as usual, this comment is me speaking for myself, and I haven’t heard from Eliezer or anyone else on whether they'd agree with the following.

My general thoughts:

The primary things I like about this post are that (1) it focuses on specific points of disagreement, encouraging us to then hash out a bunch of object-level questions; and (2) it might help wake some people from their dream if they hero-worship Eliezer, or if they generally think that leaders in this space can do no wrong.
- By "hero-worshipping" I mean a cognitive algorithm, not a set of empirical conclusions. I'm generally opposed to faux egalitarianism and the Modest-Epistemology reasoning discussed in Inadequate Equilibria: if your generalized anti-hero-worship defenses force the conclusion that there just aren't big gaps in skills or knowledge (or that skills and knowledge always correspond to mainstream prestige and authority), then your defenses are ruling out reality a priori. In saying "people need to hero-worship Eliezer less", I'm opposing a certain kind of reasoning process and mindset, not a specific factual belief like "Eliezer is the clearest thinker about AI risk".
  
  In a sense, I want to prom

... (read more)

RobBensinger

Jun 29 2022

I think that part of why Eliezer's early stuff sounds weird is: * He generally had a lower opinion of the competence of elites in business, science, etc. (Which he later updated about.) * He had a lower opinion of the field of AI in particular, as it existed in the 1990s and 2000s. Maybe more like nutrition science or continental philosophy than like chemistry, on the scale of 'field rigor and intellectual output'. If you think of A(G)I as a weird, neglected, pre-paradigmatic field that gets very little attention outside of science fiction writing, then it's less surprising to think it's possible to make big, fast strides in the field. Outperforming a competitive market is very different from outperforming a small, niche market where very little high-quality effort is going into trying new things. Similarly, if you have a lower opinion of elites, you should be more willing to endorse weird, fringe ideas, because you should be less confident that the mainstream is efficient relative to you. (And I think Eliezer still has a low opinion of elites on some very important dimensions, compared to a lot of EAs. But not to the same degrees as teenaged Eliezer.) From Competent Elites: And from Above-Average AI Scientists: I think this, plus Eliezer's general 'fuck it, I'm gonna call it like I see it rather than be reflexively respectful to authority' attitude, explains most of Ben's 'holy shit, your views were so weird!!' thing.

Lorenzo BuonannoJun 19 202216

I believe Drexler is now giving the ballpark figure of 2013. My own guess would be no later than 2010…

I didn't see the "my own guess" part in the linked document (or the archived version), but it seems to be visible here, was probably edited between 2001 and 2004. Mentioned it in case others are confused after trying to find the quote in context.

[anonymous]Jun 25 202213

Perhaps also relevant, though it isn’t forecasting, is Eliezer’s weak (in my opinion) attempted takedown of Ajeya Cotra’s bioanchors report on AI timelines. Here’s Eliezer’s bioanchors takedown attempt, here’s Holden Karnofsky’s response to Eliezer, and here’s Scott Alexander’s response.

RobBensinger

Jun 28 2022

Eliezer's post was less a takedown of the report, and more a takedown of the idea that the report provides a strong basis for expecting AGI in ~2050, or for discriminating scenarios like 'AGI in 2030', 'AGI in 2050', and 'AGI in 2070'. The report itself was quite hedged, and Holden posted a follow-up clarification emphasizing that “biological anchors” is about bounding, not pinpointing, AI timelines. So it's not clear to me that Eliezer and Ajeya/Holden/etc. even disagree about the core question "do biological anchors provide a strong case for putting a median AGI year in ~2050?", though maybe they disagree on the secondary question of how useful the "bounds" are. Copying over my high-level view, which I recently wrote on Twitter:

RobBensinger

Jun 28 2022

Commenting on a few minor points from Scott's post, since I meant to write a full reply at some point but haven't had the time: I'd say 'clearly not, for some possible AI designs'; but maybe it will be true for the first AIs we actually build, shrug. Why aren't there examples like 'amount of cargo a bird can carry compared to an airplane', or 'number of digits a human can multiply together in ten seconds compared to a computer'? Seems like you'll get a skewed number if your brainstorming process steers away from examples like these altogether. 'AI physicist' is less like an artificial heart (trying to exactly replicate the structure of a biological organ functioning within a specific body), more like a calculator (trying to do a certain kind of cognitive work, without any constraint at all to do it in a human-like way).

MichaelDickensJun 23 20225

I read this post kind of quickly, so apologies if I'm misunderstanding. It seems to me that this post's claim is basically:

Eliezer wrote some arguments about what he believes about AI safety.
People updated toward Eliezer's beliefs.
Therefore, people defer too much to Eliezer.

I think this is dismissing a different (and much more likely IMO) possibility, which is that Eliezer's arguments were good, and people updated based on the strength of the arguments.

(Even if his recent posts didn't contain novel arguments, the arguments still could have been novel to many readers.)

Linch

Jun 23 2022

I'm a bit confused by both this post and comments about questions like what level/timing the deference happens. Speaking for myself, if an internet rando wrote a random blog post called "AGI Ruin: A List of Lethalities," I probably would not read it. But I did read Yudkowsky's post carefully and thought about it nontrivially, mostly due to his track record and writing ability (rather than e.g. because the title was engaging or because the first paragraph was really well-argued).

Jack MaldeJun 22 20224

I'm confused by the fact Eliezer's post was posted on April Fool's day. To what extent does that contribute to conscious exaggeration on his part?

Guy Raveh

Jun 22 2022

Right? Up to reading this post, I was convinced it was an April Fool's post.

RobBensinger

Jun 23 2022

The post is serious. Details: https://www.lesswrong.com/posts/j9Q8bRmwCgXRYAgcJ/miri-announces-new-death-with-dignity-strategy?commentId=FounAZsg4kFxBDiXs

Dr. David Mathers

Jun 23 2022

It seems really bad, from a communications/PR point of view, to write something that was ambiguous in this way. Like, bad enough that it makes me slightly worried that MIRI will commit some kind of big communications error that gets into the newspapers and does big damage to the reputation of EA as a whole.

VictorSintNicolaasJun 19 20224

As someone not active in the field of AI risk, and having always used epistemic deference quite heavily, this feels very helpful. I hope it doesn't end up reducing society's efforts to stop AI from taking over the world some day.

JulianHazellJun 19 202225

On the contrary, my best guess is that the “dying with dignity” style dooming is harming the community’s ability to tackle AI risk as effectively as it otherwise could

David JohnstonJun 20 20223

I agree with many of the comments here that this is overall a bit unfair, and there are good reasons to take Yudkowsky seriously even if you don't automatically accept his self-expressed level of confidence.

My main criticism of Yudkowsky is that he has many innovative/somewhat compelling ideas, but even with many years and a research institution their evolution has been unsatisfying. Many of them are still imprecise, and some of those that are precise(ish) are not satisfactory (e.g the orthogonality thesis, mesa-optimizers). Furthermore, he still doesn't seem very interested in improving this situation.

Zach Stein-PerlmanJun 19 20221

Almost all of this seems reasonable. But:

Yudkowsky has previously held short AI timeline views that turned out to be wrong

I don't think we should update based on this, or eg on the fact that we didn't go extinct due to nanotechnology, because anthropics / observer selection. (We should only update based on whether we think the reasons for those beliefs were bad.)

Derek ShillerJun 19 202248

Suppose you've been captured by some terrorists and you're tied up with your friend Eli. There is a device on the other side of the room you that you can't quite make out. Your friend Eli says that he can tell (he's 99% sure) it is a bomb and that it is rigged to go off randomly. Every minute, he's confident there's a 50-50 chance it will explode, killing both of you. You wait a minute and it doesn't explode. You wait 10. You wait 12 hours. Nothing. He starts eying the light fixture, and say's he's pretty sure there's a bomb there too. You believe him?

Zach Stein-PerlmanJun 19 202223

No, my survival for 12 hours is evidence against Eli being correct about the bomb.

So: oops, I think.

Zach Stein-Perlman

Jun 20 2022

I'm still not totally comfortable. I think my confusion arose because I was considering the related question of whether I could use my better knowledge than Eli to win money from bets (in expectation) -- I couldn't, because Eli has no reason to bet on the bomb going off. More generally, Eliezer never had reason to bet (in the sense that he gets epistemic credit if he's right) on nanotech-doom-by-2010, because in the worlds where he's right we're dead. It feels weird to update against Eliezer on the basis of beliefs that he wouldn't have bet on; updating against him doesn't seem to be incentive-compatible... but maybe that's just the sacrifice immanent to the epistemic virtue of publicly sharing your belief in doom.

rhollerith

Jun 19 2022

I am willing to bite your bullet. I had a comment here explaining my reasoning, but deleted it because I plan to make a post instead.

1[comment deleted]Jun 19 2022

Yonatan CaleJun 20 20220

I think posts like this better open with "but consider forming your own opinions rather than relying on experts"

𝕮𝖎𝖓𝖊𝖗𝖆Jun 19 20220

I prefer to just analyse and refute his concrete arguments on the object level.

I'm not a fan of engaging the person of the arguer instead of their arguments.

Granted, I don't practice epistemic deference in regards to AI risk (so I'm not the target audience here), but I'm really not a fan of this kind of post. It rubs me the wrong way.

Challenging someone's overall credibility instead of their concrete arguments feels like bad form and [logical rudeness] (https://www.lesswrong.com/posts/srge9MCLHSiwzaX6r/logical-rudeness).

I wish EAs did not engage in such be... (read more)

bgarfinkelJun 19 202261

I prefer to just analyse and refute his concrete arguments on the object level.

I agree that work analyzing specific arguments is, overall, more useful than work analyzing individual people's track records. Personally, partly for that reason, I've actually done a decent amount of public argument analysis (e.g. here, here, and most recently here) but never written a post like this before.

Still, I think, people do in practice tend to engage in epistemic deference. (I think that even people who don't consciously practice epistemic deference tend to be influenced by the views of people they respect.) I also think that people should practice some level of epistemic deference, particularly if they're new to an area. So - in that sense - I think this kind of track record analysis is still worth doing, even if it's overall less useful than argument analysis.

𝕮𝖎𝖓𝖊𝖗𝖆Jun 19 202214

(I hadn't seen this reply when I made my other reply).

What do you think of legitimising behaviour that calls out the credibility of other community members in the future?

I am worried about displacing the concrete object level arguments as the sole domain of engagement. A culture in which arguments cannot be allowed to stand by themselves. In which people have to be concerned about prior credibility, track record and legitimacy when formulating their arguments...

It feels like a worse epistemic culture.

Karthik Tadepalli

Jun 19 2022

Expert opinion has always been a substitute for object level arguments because of deference culture. Nobody has object level arguments for why x-risk in the 21st century is around 1/6: we just think it might be because Toby Ord says so and he is very credible. Is this ideal? No. But we do it because expert priors are the second best alternative when there is no data to base our judgments off of. Given this, I think criticizing an expert's priors is functionally an object level argument, since the expert's prior is so often used as a substitute for object level analysis. I agree that a slippery slope endpoint would be bad but I do not think criticizing expert priors takes us there.

𝕮𝖎𝖓𝖊𝖗𝖆

Jun 19 2022

To expand on my complaints in the above comment. I do not want an epistemic culture that finds it acceptable to challenge an individuals overall credibility in lieu of directly engaging with their arguments. I think that's unhealthy and contrary to collaborative knowledge growing. Yudkowsky has laid out his arguments for doom at length. I don't fully agree with those arguments (I believe he's mistaken in 2 - 3 serious and important ways), but he has laid them out, and I can disagree on the object level with him because of that. Given that the explicit arguments are present, I would prefer posts that engaged with and directly refuted the arguments if you found them flawed in some way. I don't like this direction of attacking his overall credibility. Attacking someone's credibility in lieu of their arguments feels like a severe epistemic transgression. I am not convinced that the community is better for a norm that accepts such epistemic call out posts.

bgarfinkelJun 19 202239

I do not want an epistemic culture that finds it acceptable to challenge an individuals overall credibility in lieu of directly engaging with their arguments.

I think I roughly agree with you on this point, although I would guess I have at least a somewhat weaker version of your view. If discourse about people's track records or reliability starts taking up (e.g.) more than a fifth of the space that object-level argument does, within the most engaged core of people, then I do think that will tend to suggest an unhealthy or at least not-very-intellectually-productive community.

One caveat: For less engaged people, I do actually think it can make sense to spend most of your time thinking about questions around deference. If I'm only going to spend ten hours thinking about nanotechnology risk, for example, then I might actually want to spend most of this time trying to get a sense of what different people believe and how much weight I should give their views; I'm probably not going to be able to make a ton of headway getting a good gears-level-understanding of the relevant issues, particularly as someone without a chemistry or engineering background.

Holly_ElmoreJun 19 202228

> I do not want an epistemic culture that finds it acceptable to challenge an individuals overall credibility in lieu of directly engaging with their arguments.

I think it's fair to talk about a person's lifetime performance when we are talking about forecasting. When we don't have the expertise ourselves, all we have to go on is what little we understand and the track records of the experts we defer to. Many people defer to Eliezer so I think it's a service to lay out his track record so that we can know how meaningful his levels of confidence and special insights into this kind of problem are.

Guy Raveh

Jun 20 2022

I don't think this is realistic. There is much more important knowledge than one can engage with in a lifetime. The only way of forming views about many things is to somehow decide who to listen to, or at least how to aggregate relevant more strongly based opinions (so, who to count as an expert and who not to and with what weight).

genidmaJun 25 2022-1

Tldr

Personally and from my very uneducated vantage point. I question why a superintelligence with a truly universal set of ethics, would pose a risk to other lifeforms. But I also do not know how the initial conditions can be architected. If indeed the initial conditions can be set/architected. That could go a different set of ways and depending on who's values.
What I worry about is what humans (enhanced or not) and cyborgs may chose to do with the bread-crumbs (the leftovers). Or the steps taken to get to AGI.

Here is a schematic (link below) that I starte... (read more)

Charles HeJun 19 2022-24

[comment deleted]Jun 19 202231

Deleted by kokotajlod, 06/20/2022

Reason: Accidental duplicate