https://www.lesswrong.com/users/brook
Thanks for making this! I also feel like I get a lot of value out of quarterly/yearly reviews, and this looks like a nice prompting tool. If you haven't seen it already, you might like to look at Pete Slattery's year-review question list too!
I think this is one reasonable avenue to explore alignment, but I don't want everybody doing it.
My impression is that AI researchers exist on a spectrum from only doing empirical work (of the kind you describe) to only doing theoretical work (like Agent Foundations), and most fall in the middle, doing some theory to figure out what kind of experiment to run, and using empirical data to improve their theories (a lot of science looks like this!).
I think all (or even a majority of) AI safety researchers moving to doing empirical work on current AI systems is unwise, for two reasons:
The first one is the biggy. I can imagine this approach working (perhaps inefficiently) in a world were (1) were false and (2) were true, but I can't imagine this approach working in any worlds where (1) holds.
Epistemic status: just a 5-minute collation of some useful sources, with a little explanatory text off the top of my head.
Stampy's answers to "Why is AI dangerous?"and "Why might we expect a superintelligence to be hostile by default?" seem pretty good to me.
To elaborate a little:
Alignment seems hard. Humans value very complex things, which it seems both A) difficult to tell an AI to preserve and B) seem unlikely for AI to preserve by default.
A number of things seem to follow pretty directly from the idea of 'creating an agent which is much more intelligent than humans':
When you combine these things, you get an expectation that the default outcome of unaligned AGI is very bad for humans -- and an idea of why AI alignment may be difficult.
To take a different approach:
Humans have a pretty bad track record of not using massively destructive technology. It seems at least plausible that COVID-19 was a lab leak (and its plausibility is enough for this argument). The other key example to me is the nuclear bomb.
What's important is that both of these technologies are relatively difficult to get access to. At least right now, it's relatively easy to get access to state-of-the-art AI.
Why is this important? It's related to the unilateralist's curse. If we think that AI has the potential to be very harmful (which deserves its own debate), then the more people that have access to it, the more likely that harm becomes. Given our track record with lower-access technologies, it seems likely from this frame that accelerationism will lead to non-general artificial intelligence being used to do massive harm by humans.