P

Pivocajs

83 karmaJoined

Bio

Vojta Kovarik. AI alignment and game theory researcher.

Comments
24

Topic contributions
2

I think the summary at the start of this post is too easy to misinterpret as "if you think of yourself as a smart and moral person, it's ok to go for these companies".

(None of the things the summary says seem false. But the overall impression seems too vulnerable to rationalisation along the lines of "surely I would not fall prey to these bad incentives". When reality is probably that most people fall prety to them. So at the minimum, it might be more fair to change the recommendation to something like "it's complicated, but err on the side of not joining" or "it's complicated, but we wouldn't recommend this for 95% of people who can get a job at these companies"[1].

  1. ^

    Or whatever qualifier you think is fair. The main point is to make it clear that the warnings apply to the reader as well, not just to "all the other people".

In my opinion, the main relevant alternative to this view is to be partial to the human species, as opposed to being partial to either one's current generation, or oneself. And I think the human species is kind of a weird category to be partial to, relative to those other things. Do you disagree?

I agree with this.

the best way to advance your own values is generally to actually "be there" when AI happens.

I (strongly) disagree with this. Me being alive is a relatively small part of my values. And since I am not the director of the world, me personally being around to influence things is unlikely to have a decisive impact on things I value.

In more detail: Sure, all else being equal, me being there when AI happens is mildly helpful. But the outcome of building AI seems to be a function of, among other things, (i) values of the people building it + (ii) how much reflection they can do on those values + (iii) the environment dynamics these people are subject to (e.g., the current race dynamics between AI companies). And over time, I expect the potential decrease in (i) to be far outweighed by gains in (ii) and (iii).

  • The first issue is about (i), that it is not actually me building the AGI, either now or in the future. But I am willing to grant that (all else being equal) current generation is more likely to have values closer to my values.
  • However, I expect that the factors are (ii) and (iii) are just as influential. Regarding (ii), it seems we keep making progress at philosophy, ethics, etc, and to me, this currently far outweighs the value drift in (i).
  • Regarding (iii), my impression is that the current situation is so bad that it can't get much worse, and we might as well wait. This of course depends on how likely you think we are likely to get a bad outcome if we either (a) get superintelligence without additional progress on alignment or (b) get widespread human-level AI with no progress on alignment, institution design, etc.

My personal reason for not digging into this is that my naive model of how good the AI future is: quality_of_future * amount_of_the_stuff. And there is distinction I haven't seen you acknowledged: while high "quality" doesn't require humans to be around, I ultimately judge quality by my values. (Thing being conscious is an example. But this also includes things like not copy-pasting the same thing all over, not wiping out aliens, and presumably many other things I am not aware of. IIRC Yudkowsky talks about cosmopolitanism being a human value.) Because of this, my impression is that if we hand over the future to a random AI, the "quality" will be very low. And so we can currently have a much larger impact by focusing on increasing the quality. Which we can do by delaying "handing over the future to AI" and picking a good AI to hand over to. IE, alignment.

(Still, I agree it would be nice if there was a better analysis of this, which exposed the assumptions.)

In terms of feedback/reaction: I work on AI alignment, game theory, and cooperative AI, so Moloch is basically my key concern. And from that position, I highly approve of the overall talk, and of all of the content in particular --- except for one point, where I felt a bit so-so. And that is the part about what the company leaders can do to help the situation.

The key thing is 9:58-10:09 ("We need leaders who are willing to flip the Moloch's playbook. ...") , but I think this part then changes how people interpret 10:59-10:11 ("Perhaps companies can start competing over who ... "). I don't mean to say that I strongly disagree here --- rather, I mean that this part seems objectively speculative, which was in contrast with everything else in the talk (which seemed super solid).

More specifically, the talk's formulation suggested to me that the key thing is whether the leaders would be willing to not play the Moloch game. In contrast, it seems quite possible that this by itself wouldn't help at all, for example because they would just get fired if they tried. My personal guess is that "the key thing" is affordance the leaders have for not playing the Moloch game / the costs they incur for doing so. Or perhaps the combination of this and the willingness to not play the Moloch game. And this is also how I would frame the 10:59-10:11 part --- that we should try to make it such that the companies can compete on those other things that turn this into a race to the top. (As opposed to "the companies should compete on those other things".)

Re “Middle management is toxic, we should avoid it.”:

I want to flag that: your counterargument here does not properly address the points from Middle Manager Hell / the Immoral Mazes sequences. (Less constructively, "Middle management being toxic" seems like a quite weak version of the arguments against large orgs. Which suggests that your counterargument might not work against the stronger version. More constructively, one difference between current EA structure and large orgs is that small EA orgs are not married to a single funder. This imo reduces the "toxicity" you might otherwise get by the invectives structure in large companies. There might be other important differences; I just haven't thought about this enough.)

All that said, perhaps we can get the best of the both worlds by using larger orgs for some things but not all? And inventing some tools that make it easier to get the benefits you want without all of the costs? (Example: something that allows people to temporarily/tentatively switch jobs without having to deal with all the paperwork.)

Just to highlight a particular example: suppose you have a prediction market on "How much will be inflation of USD over the next 2 years?", that is priced in USD.

I suggest editing the post by adding a tl;dr section to the top of the post. Or maybe change the title to something like Why "just make an agent which cares only about binary rewards" doesn't work.


Reasoning: To me, the considerations in the post mostly read as rehashing standard arguments, which one should be familiar if they thought about the problem themselves, or went through AGI Safety Fundamentals, etc. It might be interesting to some people, but it would be good to have the clear indication that this isn't novel.

Also: When I read the start of the post, I went "obviously this doesn't work". Then I spent several minutes reading the post to see where the flaw in your argument is, and point it out. Only to find that your conclusion is "yeah, this doesn't help". If you edit the post, you might save other people from wasting their time in a similar manner :-).

I am at high P(doom|AGI pre-2035), but not at near-certainty. Say, 75% but not 99.9%.

The reason for that is that I find both "fast takeoff takeover" and "continous multipolar takeoff" scenarios plausible (with no decisive evidence for one or the other). In "continuous multipolar takeoff", you still get superintelligences running around. However, they would be "superintelligent with respect civilization-2023" but not necessarily wrt civilization-then. And for the standard somewhat-well-thought-out AI takeover arguments to aply, you need to be superintelligent wrt civilization-then.

Two disclaimers: (1) Just because you don't get discontinuity in influence around human level does not mean you can't get it later. In my book, world can look "Christiano-like", until suddenly it looks "Yudkowsky-like". (2) Even if we never get AI singleton, things can still go horribly wrong (ie, Christiano's what failure looks like). But imo those scenarios are much harder to reason about, and we have haven't thought them out in enough detail to justify high certainty of either outcome.

My intuitive aggregation of this gives, say, 80% P(doom this century|AGI pre-2035). On top of that, I add some 5-10% on "I am so wrong about some of this that even the high-level reasoning doesn't apply". (Which includes being wrong about where the burden of proofs, and priors, lie for P(doom|AGI).) And that puts me at the (ass-) number 75%.

Nitpicky feedback on the presentation:

If I am understanding it correctly, the current format of the tables makes them fundamentally incapable of expressing evidence for insects being unable to feel pain. (The colour coding goes from green=evidence for to red=no evidence, and how would you express ??=evidence against?) I would be more comfortable with a format without this issue, in particularly since it seems justified to expect the authors to be biased towards wanting to find evidence for. [Just to be clear, I am not pushing against the results, or against for caring about insects. Just against the particular presentation :-).]

After thinking about it more, I would interpret (parts of) the post as follows:

  • To the extent that we found research on these orders O and criteria C, each of the orders satisfies each of the criteria.
  • We are not saying anything about the degree to which a particular O satisfies a particular C. [Uhm, I am not sure why. Are the criteria extremely binary, even if you measure them statistically? Or were you looking at the degrees, and every O satisfied every C to a high enough degree that you just decided not to talk about it in the post?]
  • To recap: you don't talk about the degrees-of-satisfying-criteria, and any research that existed pointed towards sufficient-degree-of-C, for any O and C. Given this, the tables in this post essentially just depict "How much quality-adjusted research we found on this."
  • In particular, the tables do not depict anything like "Do we think these insects can feel pain, according to this measure?". Actually, you believe that probably once there is enough high-quality research, the research will conclude that all insects will satisfy all of the criteria. (Or all orders of insects sufficiently similar to the ones you studied.)
    [Here, I mean "believe" in the Bayesian sense where if you had to bet, this is what you would bet on. Not in the sense of you being confident that all the research will come up this way. In particular, no offense meant by this :-) .]

Is this interpretation correct? If so, then I register the complaint that the post is a bit confusing --- not particularly sure why, just noticing that it made me confused. Perhaps it's the thing where I first understood the tables/conclusions as "how much pain do these types of insects feel?". (And I expect others might get similarly confused.)

I saw the line "found no good evidence that anything failed any criterion", but just to check explicitly: What do the confidence levels mean? In particular, should I read "low confidence" as "weak evidence that X feels pain-as-operationalized-by-Criterion Y"? Or as "strong evidence that X does not  feel pain-as-operationalized-by-Criterion Y"?

In other words:

  • Suppose you did the same evaluation for the order Rock-optera (uhm, I mean literal rocks). (And suppose there was literature on that :-).) How would the corresponding row look like? All white, or would you need to add a new colour for that?
  • Suppose you found 1000 high-quality papers on order X and Criterion Y, and all of them suggested that X is precisely borderline between satisfying Y vs not satisfying it. How would this show up in the tables?
Load more