To push back on this point, presumably even if grantmaker time is the binding resource and not money, Redwood also took up grantmaker time from OP (indeed I'd guess that OP's grantmaker time on RR is much higher than for most other grants given the board member relationship). So I don't think this really negates Omega's argument--it is indeed relevant to ask how Redwood looks compared to grants that OP hasn't made.
Personally, I am pretty glad Redwood exists and think their research so far is promising. But I am also pretty disappointed that OP hasn't funded some academics that seem like slam dunks to me and think this reflects an anti-academia bias within OP (note they know I think this and disagree with me). Presumably this is more a discussion for the upcoming post on OP, though, and doesn't say whether OP was overvaluing RR or undervaluing other grants (mostly the latter imo, though it seems plausible that OP should have been more critical about the marginal $1M to RR especially if overhiring was one of their issues).
Thanks for this! I think we still disagree though. I'll elaborate on my position below, but don't feel obligated to update the post unless you want to.
* The adversarial training project had two ambitious goals, which were the unrestricted threat model and also a human-defined threat model (e.g. in contrast to synthetic L-infinity threat models that are usually considered).
* I think both of these were pretty interesting goals to aim for and at roughly the right point on the ambition-tractability scale (at least a priori). Most research projects are less ambitious and more tractable, but I think that's mostly a mistake.
* Redwood was mostly interested in the first goal and the second was included somewhat arbitrarily iirc. I think this was a mistake and it would have been better to start with the simplest case possible to examine the unrestricted threat model. (It's usually a mistake to try to do two ambitious things at once rather than nailing one, moreso if one of the things is not even important to you.)
* After the original NeurIPS paper Redwood moved in this direction and tried a bunch of simpler settings with unrestricted threat models. I was an advisor on this work. After several months with less progress than we wanted, we stopped pursuing this direction. It would have been better to get to a point where we could make this call sooner (after 1-2 months). Some of the slowness was indeed due to unfamiliarity with the literature, e.g. being stuck on something for a few weeks that was isomorphic to a standard gradient hacking issue. My impression (not 100% certain) is Redwood updated quite a bit in the direction of caring about related literature as a result of this, and I'd guess they'd be a lot faster doing this a second time, although still with room to improve.
Note by academic standards the project was a "success" in the sense of getting into NeurIPS, although the reviewers seemed to most like the human-defined aspect of the threat model rather than the unrestricted aspect.
I'll briefly comment on a few parts of this post since my name was mentioned (lack of comment on other parts does not imply any particular position on them). Also, thanks to the authors for their time writing this (and future posts)! I think criticism is valuable, and having written criticism myself in the past, I know how time-consuming it can be.
I'm worried that your method for evaluating research output would make any ambitious research program look bad, especially early on. Specifically:
The failure of Redwood's adversarial training project is unfortunately wholly unsurprising given almost a decade of similarly failed attempts at defenses to adversarial robustness from hundreds or even thousands of ML researchers.
I think for any ambitious research project that fails, you could tell a similarly convincing story about how it's "obvious in hindsight" it would fail. A major point of research is to find ideas that other people don't think will work and then show that they do work! For many of my most successful research projects, people gave me advice not to work on them because they thought it would predictably fail, and if I had failed then they could have said something similar to what you wrote above.
I think Redwood's failures here are ones of execution and not of problem selection--I thought the problem they picked was pretty interesting but they could have much more quickly realized the particular approaches they were taking to it were unlikely to pan out. If they had done that, perhaps they would have switched to other approaches that ended up succeeding, or just pivoted to interpretability faster. In any case, I definitely wouldn't want to discourage them or future organizations from using a similar problem selection process.
(If you asked a random ML researcher if the problem seemed feasible, they would have said no. But I wouldn't have used that as a reason not to work on the project.)
CTO Buck Shlegeris has 3 years of software engineering experience and a limited ML research background.
My personal judgment is that Buck is a stronger researcher than most people with ML PhDs. He is weaker at empirical ML than this baseline, but very strong conceptually in ways that translate well to machine learning. I do think Buck will do best in a setting where he's either paired with a good empirical ML researcher or gains more experience there himself (he's already gotten a lot better in the past year). But overall I view Buck as on par with a research scientist at a top ML university.
Thanks for this thoughtful and excellently written post. I agree with the large majority of what you had to say, especially regarding collective vs. individual epistemics (and more generally on the importance of good institutions vs. individual behavior), as well as concerns about insularity, conflicts of interest, and underrating expertise and overrating "value alignment". I have similarly been concerned about these issues for a long time, but especially concerned over the past year.
I am personally fairly disappointed by the extent to which many commenters seem to be dismissing the claims or disagreeing with them in broad strokes, as they generally seem true and important to me. I would value the opportunity to convince anyone in a position of authority in EA that these critiques are both correct and critical to address. I don't read this forum often (was linked to this thread by a friend), but feel free to e-mail me (jacob.steinhardt@gmail.com) if you're in this position and want to chat.
Also, to the anonymous authors, if there is some way I can support you please feel free to reach out (also via e-mail). I promise to preserve your anonymity.
Thanks for writing this! One thing that might help would be more examples of Phase 2 work. For instance, I think that most of my work is Phase 2 by your definition (see here for a recent round-up). But I am not entirely sure, especially given the claim that very little Phase 2 work is happening. Other stuff in the "I think this counts but not sure" category would be work done by Redwood Research, Chris Olah at Anthropic, or Rohin Shah at DeepMind (apologies to any other people who I've unintentionally left out).
Another advantage of examples is it could help highlight what you want to see more of.
I'm teaching a class on forecasting this semester! The notes will all be online: http://www.stat157.com/
I generally agree with the spirit of empathy in this comment, but I also think you may be misinterpreting Dustin in a similar way to how others are. My understanding is that Dustin is not primarily driven by how other actors might use his funding / public comments against him. Instead, it is something like the following:
"Dustin doesn't want to be continually funding stuff that he doesn't endorse, because he thinks that doing things well and being responsible for the consequences of your actions is intrinsically important. He is a virtue ethicist and not a utilitarian in this regard. He feels that OP has funded things he doesn't endorse enough times in enough areas to not want to extend blanket trust, and thus feels more responsibility than before to evaluate cases himself, to make sure that both individual grants and higher-level funding strategies are aligned with his values. He believes in doing fewer things well than more things poorly, which is why some areas are being cut."
Obviously this could be wrong and I don't want Dustin to feel any obligation to confirm/not confirm it. I'm writing it because I'm fairly confident that it's at least more right than the prevailing narrative currently in the comments, and because the reasoning makes a fair amount of sense to me (and much more sense than the PR-based narrative that many are currently projecting).