Vojta Kovarik. AI alignment and game theory researcher.
I suggest editing the post by adding a tl;dr section to the top of the post. Or maybe change the title to something like Why "just make an agent which cares only about binary rewards" doesn't work.
Reasoning: To me, the considerations in the post mostly read as rehashing standard arguments, which one should be familiar if they thought about the problem themselves, or went through AGI Safety Fundamentals, etc. It might be interesting to some people, but it would be good to have the clear indication that this isn't novel.
Also: When I read the start of the post, I went "obviously this doesn't work". Then I spent several minutes reading the post to see where the flaw in your argument is, and point it out. Only to find that your conclusion is "yeah, this doesn't help". If you edit the post, you might save other people from wasting their time in a similar manner :-).
I am at high P(doom|AGI pre-2035), but not at near-certainty. Say, 75% but not 99.9%.
The reason for that is that I find both "fast takeoff takeover" and "continous multipolar takeoff" scenarios plausible (with no decisive evidence for one or the other). In "continuous multipolar takeoff", you still get superintelligences running around. However, they would be "superintelligent with respect civilization-2023" but not necessarily wrt civilization-then. And for the standard somewhat-well-thought-out AI takeover arguments to aply, you need to be superintelligent wrt civilization-then.
Two disclaimers: (1) Just because you don't get discontinuity in influence around human level does not mean you can't get it later. In my book, world can look "Christiano-like", until suddenly it looks "Yudkowsky-like". (2) Even if we never get AI singleton, things can still go horribly wrong (ie, Christiano's what failure looks like). But imo those scenarios are much harder to reason about, and we have haven't thought them out in enough detail to justify high certainty of either outcome.
My intuitive aggregation of this gives, say, 80% P(doom this century|AGI pre-2035). On top of that, I add some 5-10% on "I am so wrong about some of this that even the high-level reasoning doesn't apply". (Which includes being wrong about where the burden of proofs, and priors, lie for P(doom|AGI).) And that puts me at the (ass-) number 75%.
Nitpicky feedback on the presentation:
If I am understanding it correctly, the current format of the tables makes them fundamentally incapable of expressing evidence for insects being unable to feel pain. (The colour coding goes from green=evidence for to red=no evidence, and how would you express ??=evidence against?) I would be more comfortable with a format without this issue, in particularly since it seems justified to expect the authors to be biased towards wanting to find evidence for. [Just to be clear, I am not pushing against the results, or against for caring about insects. Just against the particular presentation :-).]
After thinking about it more, I would interpret (parts of) the post as follows:
Is this interpretation correct? If so, then I register the complaint that the post is a bit confusing --- not particularly sure why, just noticing that it made me confused. Perhaps it's the thing where I first understood the tables/conclusions as "how much pain do these types of insects feel?". (And I expect others might get similarly confused.)
I saw the line "found no good evidence that anything failed any criterion", but just to check explicitly: What do the confidence levels mean? In particular, should I read "low confidence" as "weak evidence that X feels pain-as-operationalized-by-Criterion Y"? Or as "strong evidence that X does not feel pain-as-operationalized-by-Criterion Y"?
In other words:
"National greatness, and staying ahead of China for world influence requires that we have the biggest economy. To do that, we need more people." -Matt Yglesias, One Billion Americans.
Yeah, the guy who has chosen to have one child is going to inspire me to make the sacrifices involved in having four. It might be good for America, but the ‘ask’ here looks like it is that I sacrifice my utility for Matt’s one kid, and thus is not cooperate-cooperate. I’ll jump when you jump.
Two pushbacks here:
(1) The counterargument seems rather weak here, right? Even if Matt Yglesias had no kids, that doesn't mean his argument isn't valid. EG, if a single non-vegan person claims that more people should be vegan, will you view that as evidence that people should not be vegan? ;-) (Not that I disagree with your claim. Just with your argument.)
(2) Did you actually read One Billion Americans, or are you just taking the citation and interpretting it as Matt Yglesias making an argument for people having more children? I didn't read the book, so I am not sure. But I listened to a podcast with Matt Yglesias, about the book, and my impression was that he was primarily arguing for changing the immigration policy, and (if memory serves) not really making any strong claims about how many kids people should have.
Yes, sure, probabilities are only in the map. But I don't think that matters for this. Or I just don't see what argument you are making here. (CLT is in the map, expectations are taken in the map, and decisions are made in the map (then somehow translated into the territory via actions). I don't see how that says anything about what EV reasoning relies on.)
Agree with Acylhalide's point - you only need to be non-Dutchbookable by bets that you could actually be exposed to.
To address a potential misunderstanding: I agree with both Sharmake's examples. But they don't imply you have to maximise expected utility always. Just when the assumptions apply.
More generally: expected utility maximisation is an instrumental principle. But it is justified by some assumptions, which don't always hold.
Re “Middle management is toxic, we should avoid it.”:
I want to flag that: your counterargument here does not properly address the points from Middle Manager Hell / the Immoral Mazes sequences. (Less constructively, "Middle management being toxic" seems like a quite weak version of the arguments against large orgs. Which suggests that your counterargument might not work against the stronger version. More constructively, one difference between current EA structure and large orgs is that small EA orgs are not married to a single funder. This imo reduces the "toxicity" you might otherwise get by the invectives structure in large companies. There might be other important differences; I just haven't thought about this enough.)
All that said, perhaps we can get the best of the both worlds by using larger orgs for some things but not all? And inventing some tools that make it easier to get the benefits you want without all of the costs? (Example: something that allows people to temporarily/tentatively switch jobs without having to deal with all the paperwork.)