57

0
0

Reactions

0
0
Comments13
Sorted by Click to highlight new comments since:

Might be a naive question:

For a STEM-capable AGI (or any intelligence for that matter) to do new science, it would have to interact with the physical environment to conduct experiments. Otherwise, how can the intelligent agent discover and validate new theories? For example, an AGI that understands physics and material science may theorize and propose thousands of possible high-temperature superconductors, but actually discovering a working material can happen only after actually synthesizing those materials and performing the experiments, which is time-consuming and difficult to do.

If that's true, then the speed in which the STEM-capable AGI discovers new knowledge, and correspondingly its "knowledge advantage" (not intelligence advantage) over humanity is bottlenecked by the speed in which the AGI can interact and perform experiments in the physical world, which as of now depends almost entirely on human operated equipment and is constrained by various real world physical limitations (wear and tear, speed of chemical reactions, speed of biological systems, energy consumption etc.). Doesn't this significantly throttles the speed of AGI gaining advantage over humanity, giving us more time for alignment?

For a STEM-capable AGI (or any intelligence for that matter) to do new science, it would have to interact with the physical environment to conduct experiments.

Or read arXiv papers and draw inferences that humans failed to draw, etc.

Doesn't this significantly throttles the speed of AGI gaining advantage over humanity, giving us more time for alignment?

I expect there's a ton of useful stuff you can learn (that humanity is currently ignorant about) just from looking at existing data on the Internet. But I agree that AGI will destroy the world a little slower in expectation because it may get bottlenecked on running experiments, and it's at least conceivable that at least one project will decide not let it run tons of physical experiments.

(Though I think the most promising ways to save the world involve AGIs running large numbers of physical experiments, so in addition to merely delaying AGI doom by some number of months, 'major labs don't let AGIs run physical experiments' plausibly rules out the small number of scenarios where humanity has a chance of surviving.)

I expect there's a ton of useful stuff you can learn (that humanity is currently ignorant about) just from looking at existing data on the Internet. 

Thank you for the reply, I agree with this point. Now that I think about it, protein folding is a good example of how the data was already available but before AlphaFold, nobody could predict sequence to structure with high accuracy. Maybe a sufficiently smart AGI can get more knowledge out of existing data on the internet without performing too many new experiments.

How much more can it squeeze out of existing data (which were not generated specifically with the AGI's new hypothesis in mind), and if it that can put a decisive advantage over humanity in a short span of time could be important? I.e. whether existing data out there contains within them enough information to figure out new science that is completely beyond our current understanding and can totally screw us.

I would argue that an important component of your first argument still stands. Even though AlphaFold can predict structures to some level of accuracy based on some training data sets that may already exist, an AI would STILL need to check if what it learned is usable in practice for the purposes it is intended to. This logically requires experimentation. Also hold in mind that most data which already exists was not deliberately prepared to help a machine "do X". Any intelligence no matter how strong will still need to check its hypotheses and, thus, prepare data sets that can actually deliver the evidence necessary for drawing warranted conclusions.

I am not really sure what the consequences of this are, though. 

I think a sufficiently intelligent intelligence can generate accurate beliefs from evidence, not just 'experiments', and not just its own experiments. I imagine AIs will be suggesting experiments too (if they're not already).

It is still plausible that not being able to run its own experiments will greatly hamper AI's scientific agendas, but it's harder to know how much it will exactly for intelligences likely to be much more intelligent than ourselves.

Afaik it is pretty well established that you cannot really learn anything new without actually testing your new belief in practice, i.e., experiments. I mean how else would this work? Evidence does not grow on trees, it has to be created (i.e., data has to be carefully generated, selected and interpreted to become useful evidence). 

While it might be true that this experimenting can sometimes be done using existing data, the point is that if you want to learn something new about the universe like “what is dark matter and can it be used for something?” existing data is unlikely to be enough to test any idea you come up with. 

Even if you take data from published academic papers and synthesize some new theories from that, it is still not always (or even likely) the case that the theory you come up with can be tested with already existing data because any theory has unique requirements towards what counts as evidence against it. I mean thats the whole point why we continue to do experiments rather than just metanalyze the sh*t out of all the papers out there. 

Of course, advanced AI could trick us into doing certain experiments or looking at ChatGPT plugins, we may just give it access to anything on the internet wholesale in due time so all of this may just be a short bump in the road. If we are lucky, we might avoid a FOOM style takeover though as long as advanced AI remains dependent on us to carry out its experiments for it simply because of the time those experiments will take. So even if it could bootstrap to nanotech quickly due to good understanding of physics based on our formulas and existing data, the first manufacturing machine / factory would still need to be built somehow and that may take some time.

I feel the weakest part of this argument, and the weakest part of the AI Safety space generally, is the part where AI kills everyone (part 2, in this case).

You argue that most paths to some ambitious goal like whole-brain emulation end terribly for humans, because how else could the AI do whole-brain emulation without subjugating, eliminating or atomising everyone?

I don't think that follows. This seems like what the average hunter-gatherer would have thought when made to imagine our modern commercial airlines or microprocessor industries: how could you achieve something requiring so much research, so many resources and so much coordination without enslaving huge swathes of society and killing anyone that gets in the way? And wouldn't the knowledge to do these things cause terrible new dangers?

Luckily the peasant is wrong: the path here has led up a slope of gradually increasing quality of life (some disagree).

I think the point is not that it is not conceivable that progress can continue with humans still being alive but with the game theoretic dilemma that whatever we humans want to do is unlikely to be exactly what some super powerful advanced AI would want to do. And because the advanced AI does not need us or depend on us, we simply lose and get to be ingredients for whatever that advanced AI is up to.

Your example with humanity fails because humans have always and continue to be a social species that is dependent on each other. An unaligned advanced AI would not be so. A more appropriate example would be to look at the relationship between humans and insects. I don't know if you noticed but a lot of those are dying out right now because we simply don't care about or depend on them. The point with advanced AI would be that because it is potentially even more removed from us than we are from insects and also much more capable in achieving its goals that this whole competitive process which we all engage in is going to be much more competitive and faster when advanced AIs start playing in the game. 

I don't want to be the bearer of bad news but I think it is not that easy to reject this analysis... it seems pretty simple and solid. I would love to know if there is some flaw in the reasoning. Would help me sleep better at night! 

Your example with humanity fails because humans have always and continue to be a social species that is dependent on each other.

I would much more say that it fails because humans have human values.

Maybe a hunter-gatherer would have worried that building airplanes would somehow cause a catastrophe? I don't exactly see why; the obvious hunter-gatherer rejoinder could be 'we built fire and spears and our lives only improved; why would building wings to fly make anything bad happen?'.

Regardless, it doesn't seem like you can get much mileage via an analogy that sticks entirely to humans. Humans are indeed safe, because "safety" is indexed to human values; when we try to reason about non-human optimizers, we tend to anthropomorphize them and implicitly assume that they'll be safe for many of the same reasons. Cf. The Tragedy of Group Selectionism and Anthropomorphic Optimism.

You argue that most paths to some ambitious goal like whole-brain emulation end terribly for humans, because how else could the AI do whole-brain emulation without subjugating, eliminating or atomising everyone?

'Wow, I can't imagine a way to do something so ambitious without causing lots of carnage in the process' is definitely not the argument! On the contrary, I think it's pretty trivial to get good outcomes from humans via a wide variety of different ways we could build WBE ourselves.

The instrumental convergence argument isn't 'I can't imagine a way to do this without killing everyone'; it's that sufficiently powerful optimization behaves like maximizing optimization for practical purposes, and maximizing-ish optimization is dangerous if your terminal values aren't included in the objective being maximized.

If it helps, we could maybe break the disagreement about instrumental convergence into three parts, like:

  • Would a sufficiently powerful paperclip maximizer kill all humans, given the opportunity?
  • Would sufficiently powerful inhuman optimization of most goals kill all humans, or are paperclips an exception?
  • Is 'build fast-running human whole-brain emulation' an ambitious enough task to fall under the 'sufficiently powerful' criterion above? Or if so, is there some other reason random policies might be safe if directed at this task, even if they wouldn't be safe for other similarly-hard tasks?

The step that's missing for me is the one where the paperclip maximiser gets the opportunity to kill everyone.

Your talk of "plans" and the dangers of executing them seems to assume that the AI has all the power it needs to execute the plans. I don't think the AI crowd has done enough to demonstrate how this could happen.

If you drop a naked human in amongst some wolves I don't think the human will do very well despite its different goals and enormous intellectual advantage. Similarly, I don't see how a fledgling sentient AGI on OpenAI servers can take over enough infrastructure that it poses a serious threat. I've not seen a convincing theory for how this would happen. Mailorder nanobots seem unrealistic (too hard to simulate the quantum effects in protein chemistry), the AI talking itself out of its box is another suggestion that seems far-fetched (main evidence seems to be some chat games that Yudkowsky played a few times?), a gradual takeover by its voluntary uptake into more an more of our lives seems slow enough to stop.

Is your question basically how an AGI would gain power in the beginning in order to get to a point where it could execute on a plan to annihilate humans?

I would argue that:

  • Capitalists would quite readily give the AGI all the power it wants, in order to stay competitive and drive profits.
  • Some number of people would deliberately help the AGI gain power just to "see what happens" or specifically to hurt humanity. Think ChaosGPT, or consider the story of David Charles Hahn.
  • Some number of lonely, depressed, or desperate people could be persuaded over social media to carry out actions in the real world.

Considering these channels, I'd say that a sufficiently intelligent AGI with as much access to the real world as ChatGPT has now would have all the power needed to increase its power to the point of being able to annihilate humans.

Thank you for taking the time to write this - I think it is a clear and concise entry point into the AGI ruin arguments.

I want to voice an objection / point out an omission to point 2: I agree that any plan towards a sufficiently complicated goal will include "acquire resources" as a sub-goal, and that "getting rid of all humans" might be a by-product of some ways to achieve this sub-goal.  I'm also willing to grant that if all we now about the plan is that it achieves the end (sufficiently complicated) goal, it is likely that the plan might lead to the destruction of all humans.

However I don't see why we can't infer more about the plans. Specifically I think an ASI plan for a sufficiently complicated goal should be 1) feasible and 2) efficient (at least in some sense). If the ASI doesn't believe that it can overpower humanity, then it's plans will not include overpowering humanity. Even more, if the ASI ascribes a high enough cost to overpowering humanity, it would instead opt to acquire resources in another way.

It seems that for point 2 to hold you must think that an ASI can overpower humanity with 1) close to a 100% certainty and 2) at negligible cost to the ASI. However I don't think this is (explicitly) argued for in the article. Or maybe I'm missing something?

Thanks, I thought this to be informative!

Curated and popular this week
Relevant opportunities