Scott Alexander

2898 karmaJoined Aug 2021


I thought we already agreed the demon case showed that FDT wins in real life, since FDT agents will consistently end up with more utility than other agents.

Eliezer's argument is that you can become the kind of entity that is programmed to do X, by choosing to do X. This is in some ways a claim about demons (they are good enough to predict even the choices you made with "your free will"). But it sounds like we're in fact positing that demons are that good - I don't know how to explain how they have 999,999/million success rate otherwise - so I think he is right.

I don't think the demon being wrong one in a million times changes much. 999,999 of the people created by the demon will be some kind of FDT decision theorist with great precommitment skills. If you're the one who isn't, you can observe that you're the demon's rare mistake and avoid cutting off your legs, but this just means you won the lottery - it's not a generally winning strategy.

Decision theories are intended as theories of what is rational for you to do.  So it describes what choices are wise and which choices are foolish. 

I don't understand why you think that the choices that get you more utility with no drawbacks are foolish, and the choices that cost you utility for no reason are wise.

On the Newcomb's Problem post, Eliezer explicitly said that he doesn't care why other people are doing decision theory, he would like to figure out a way to get more utility. Then he did that. I think if you disagree with his goal, you should be arguing "decision theory should be about looking good, not about getting utility" (so we can all laugh at you) rather than saying "Eliezer is confidently and egregiously wrong" and hiding the fact that one of your main arguments is that he said we should try to get utility instead of failing all the time and then came up with a strategy that successfully does that.

I think rather than say that Eliezer is wrong about decision theory, you should say that Eliezer's goal is to come up with a decision theory that helps him get utility, and your goal is something else, and you have both come up with very nice decision theories for achieving your goal.

(what is your goal?)

My opinion on your response to the demon question is "The demon would never create you in the first place, so who cares what you think?" That is, I think your formulation of the problem includes a paradox - we assume the demon is always right, but also, that you're in a perfect position to betray it and it can't stop you. What would actually happen is the demon would create a bunch of people with amputation fetishes, plus me and Eliezer who it knows wouldn't betray it, and it would never put you in the position of getting to make the choice in real life (as opposed to in an FDT algorithmic way) in the first place. The reason you find the demon example more compelling than the Newcomb example is that it starts by making an assumption that undermines the whole problem - that is, that the demon has failed its omniscience check and created you who are destined to betray it. If your problem setup contains an implicit contradiction, you can prove anything.

I don't think this is as degenerate a case as "a demon will torture everyone who believes FDT". If that were true, and I expected to encounter that demon, I would simply try not to believe FDT (insofar as I can voluntarily change my beliefs). While you can always be screwed over by weird demons, I think decision theory is about what to choose in cases where you have all of the available knowledge and also a choice in the matter, and I think the leg demon fits that situation.

I guess any omniscient demon reading this to assess my ability to precommit will have learned I can't even precommit effectively to not having long back-and-forth discussions, let alone cutting my legs off. But I'm still interested in where you're coming from here since I don't think I've heard your exact position before.

Have you read ? Do you agree that this is our crux?

Would you endorse the statement "Eliezer, using his decision theory, will usually end out with more utility than me over a long life of encountering the sorts of weird demonic situations decision theorists analyze, I just think he is less formally-rational" ? 

Or do you expect that you will, over the long run, get more utility than him?

Sorry if I misunderstood your point. I agree this is the strongest objection against FDT. I think there is some sense in which I can become the kind of agent who cuts off their legs (ie by choosing to cut off my legs), but I admit this is poorly specified.

I think there's a stronger case for, right now, having heard about FDT for the first time, deciding I will follow FDT in the future. Various gods and demons can observe this and condition on my decision, so when the actual future comes around, they will treat me as an FDT-following agent rather than a non-FDT-following agent. Even though future-created-me isn't exactly in a position to influence the (long-since gone) demon, current me is in a position to make this decision for future relevant situations, and should decide to follow FDT in general. Part of this decision I've made involves being the kind of person who would take the FDT option in hypothetical scenarios.

Then there's the additional question of whether to defect against the demons/gods later, and say "Haha, back in August 2023 I resolved to become an FDT agent, and I fooled you into believing me, but now that I've been created I'm just going to not cut off my legs after all". I think of this as - suppose every past being created by the demon has cut off its legs, ie the demon has a 100% predictive success rate over millions of cases. So the demon would surely predict if I would do this. That means I should (now) try really hard not to do this. Cf. Parfit's Hitchhiker. Can I bind my future self like this? I think empirically yes - I think I have enough honor that if I tell hypothetical demon gods now that I'm going to do various things, I can actually do them when the time comes. This will be "irrational" in some sense, but I'll still end up with more utility than everyone else. 

Is there some sense in which, if I decide not to cut off my legs, I would wink out of existence? I admit feeling a superstitious temptation to believe this (a non-superstitious justification might be wondering if I'm the real me, or a version of me in the omniscient demon's simulation to predict what I would do). I think the literal answer is no but that it's practically useful to keep my superstitious belief in this to allow myself to do the irrational thing that gets me more utility. But this is a weird enough sidetrack that I'm really not sure I'm still in normal Eliezer-approved-decision-theory-land at all.

I think an easier question is whether you should program an AI to always keep its pre-emptive bargains with gods and demons; here the answer is just straightforwardly yes. You don't have to assume that your actions alter your algorithm, you can just alter the algorithm directly. I think this is what Eliezer is most interested in, though I'm not sure.

Were there bright people who said they had checked his work, understood it, agreed with him, and were trying to build on it? Or just people who weren't yet sure he was wrong?

I don't want to get into a long back-and-forth here, but for the record I still think you're misunderstanding what I flippantly described as "other Everett branches" and missing the entire motivation behind Counterfactual Mugging. It is definitely not supposed to directly make sense in the exact situation you're in. I think this is part of why a variant of it is called "updateless", because it makes a principled refusal to update on which world you find yourself in in order to (more flippant not-quite-right description) program the type of AIs that would weird games played against omniscient entities.

If the demon would only create me conditional on me cutting off my legs after I existed, and it was the specific class of omniscient entity that FDT is motivated by winning games with, then I would endorse cutting off my legs in that situation. 

(as a not-exactly-right-but-maybe-helpful intuition pump, consider that if the demon isn't omniscient - but simply reads the EA Forum - or more strictly can predict the text that will appear on the EA Forum years in the future - it would now plan to create me but not you, and I with my decision theory would be better off than you with yours. And surely omniscience is a stronger case than just reads-the-EA-Forum!)

If this sounds completely stupid to you, and you haven't yet read the LW posts on Counterfactual Mugging. I would recommend starting there; otherwise, consider finding a competent and motivated FDT proponent (ie not me) and trying to do some kind of double-crux or debate with them, I'd be interested in seeing the results.

I won't comment on the overall advisability of this piece, but I think you're confused about the decision theory (I'm about ten years behind state of the art here, and only barely understood it ten years ago, so I might be wrong).

The blackmail situation seems analogous to the Counterfactual Mugging, which was created to highlight how Eliezer's decision theories sometimes (my flippant summary) suggest you make locally bad decisions in order to benefit versions of you in different Everett branches. Schwartz objecting "But look how locally bad this decision is!" isn't telling Eliezer anything he doesn't already know, and isn't engaging with the reasoning. I think I would pay Omega in Counterfactual Mugging; I agree Schwartz's case is harder, but provisionally I think it unintentionally adds a layer of Pascal's Wager + torture vs. dust specks by making the numbers so extreme, which are two totally unrelated reasoning vortices.

I think the "should you procreate to make your father procreate?" question only works if your father's cognitive algorithms are perfectly correlated with yours, which no real father's are. To make the example fair, it should be more like "You were created by Omega, a god who transcends time. It resolved to created you if and only if It predicted that you would procreate, and It is able to predict everything perfectly. Now should you procreate?" I would also accept "You were created by a clone of yourself in the exact same situation, down to the atom, that you find yourself in now, including worrying about being created by a clone of yourself and so on. Should you procreate?" In both of these, the question seems much more open than with a normal human father.

If Eliezer's decision theories make no sense and are ignoring easy disproofs, then everyone else who finds them compelling (or at least not obviously wrong) after long study, including people like Wei Dai, Abram Demski, Scott Garrabrant, Benya Fallenstein, etc, is also bizarrely and inexplicably wrong. This is starting to sound less like "Eliezer is a uniquely bad reasoner" and more like "there's something in the water supply here that makes extremely bright people with math PhDs make simple dumb mistakes that any rando can notice."

Thanks for writing this.

I understand why you can't go public with applicant-related information, but is there a reason grantmakers shouldn't have a private Slack channel where they can ask things like "Please PM me if any of you have any thoughts on John Smith, I'm evaluating a grant request for him now"?

Okay, so GWWC, LW, and GiveWell, what are we going to do to reverse the trend?

Seriously, should we be thinking of this as "these sites are actually getting less effective at recruiting EAs" or as "there are so many more recruitment pipelines now that it makes sense that each one would drop in relative importance" or as "any site will naturally do better in its early years as it picks the low-hanging fruit in converting its target population, then do worse later"?

The first point seems to be saying that we should factor in the chance that a program works into cost-effectiveness analysis. Isn't this already a part of all such analyses? If it isn't, I'm very surprised by that and think it would be a much more important topic for an essay than anything about PEPFAR in particular.

The second point, that people should consider whether a project is politically feasible, is well taken. It sounds like the lesson here is "if you find yourself in a situation where you have to recommend either project A or B, and both are good, but A is better than B, but if you do activism for A it still won't happen, but if you do activism for B your input will push it over the edge into happening, do activism for B." I agree with this as far as it goes, but there seem like some important caveats:

  • You shouldn't lie, for the normal reasons against lying. It sounds like some of these economists were just publishing reports or articles saying that antimalarials were better than PEPFAR, and I think if that was what they found, then publishing that in reports is correct regardless of the politics. Then when they put on their activist hats they can support whatever is most effective to support.
  • It's a really hard problem to know whether to pursue less ambitious vs. more ambitious goals. If you're a socialist, should you spend your time fighting for a $1 minimum wage increase, for Medicare For All, or for completely dismantling capitalism and replacing it with workers' communes? I don't think there's an obvious answer, even though the first is clearly more likely to succeed than the second two. In retrospect it's clear that PEPFAR worked politically, and if you say that more cost-effective options wouldn't have worked politically at that time then I believe you, but I don't want to conclude that therefore less ambitious political projects are always better than more ambitious ones, and I don't know how else to apply the lesson from this story more generally.
Load more