Akash - very nice post, and helpful for (relative) AI alignment newbies like me.
I would add that many AI alignment experts (and many EAs, actually) seem to assume that everyone getting interested in alignment is in their early 20s and doesn't know much about anything, with no expertise in any other domain.
This might often be true, but there are also some people who get interested in alignment who have already had successful careers in other fields, and who can bring new interdisciplinary perspectives that alignment research might lack. Such people might be a lot less likely to fall into traps 3, 4, 5, and 6 that you mention. But they might fall into other kinds of traps that you don't mention, such as thinking 'If only these alignment kids understood my pet field X as well as I do, most of their misconceptions would evaporate and alignment research would progress 5x faster....' (I've been guilty of this on occasion).
'If only these alignment kids understood my pet field X as well as I do, most of their misconceptions would evaporate and alignment research would progress 5x faster....' (I've been guilty of this on occasion)
+1!
I do wish EA (not just AI safety folk) wrote more about the ideas they've dismissed and all the cause areas and project ideas they've dismissed.
If no one mentions your pet idea its easier to fall into the trap of thinking it's a great idea nobody is seeing, or thinkjng there's motivated reasoning preventing others from seeing or admitting it, and so on.
Whereas if they explicitly mention it and say it's not worth it at the margin, now you can atleast discuss and contest that claim. At the very least you know you're not the first person to come up with it, and some thinking has already gone into it.
In my perspective, new and useful innovations in the past, especially in new fields, came from people with a wide and deep education and skillset that takes years to learn; and from fragmented research where no-one is necessarily thinking of a very high level terminal goal.
How sure are you that advice like "don't pursue proxy goals" or "don't spend years getting a degree" are useful for generating a productive field of AI alignment research, and not just for generating people who are vaguely similar to existing researchers who are thought of as successful? Or who can engage with existing research but will struggle with stepping outside its box?
After all:
Many existing researchers who have made interesting and important contributions do have PhDs,
And it doesn't seem like we're anywhere close to "solving alignment", so we don't actually know that being able to engage with their research without a much broader understanding is really that useful.
I like this, it's a higher resolution description of what I think of as "not staying stuck in other people's boxes", or at least part of it.
I plan to try avoiding these in the next few months, and live blog about it, so that people can correct mistakes I make early on, as opposed to me keeping doing wrong things for long. Any chance you'd like to follow?
B. They lose sight of the terminal goal. The real goal is not to skill-up in ML. The real goal is not to replicate the results of a paper. The real goal is not even to “solve inner alignment.” The real goal is to not die & not lose the value of the far-future.
I'd argue that if they solved inner alignment totally, then the rest of the alignment problems becomes far easier if not trivial to solve.
What do you feel are the downsides of someone spending, say, 2 years more in skilling up than they actually should? Besides grantmaker money paying for salary (which I agree is valuable and worth preserving), I'm not sure I see a big downside. 2 years for instance is a short span of time relative to the lifetime you could devote to study, it's also small relative to AI timelines (unless your timelines are really 2035 median or closer).
If we've got maybe 2-3 years left before AGI, then 2 years before starting is indeed a large percentage of that remaining time. Even if we have more like 5-10... maybe better to just starting trying to work directly on the problem as best you can than let yourself get distracted by acquiring general background knowledge.
I agree this is true if real timelines were 10 years or shorter. But timelines 10 years or shorter is a fringe view, and students who haven't formed strong inside views on timelines typically won't have a reason to defer to them at the expense of deferring to everyone else who has longer than 10 year timelines.
Calling my strongly held inside view 'fringe' doesn't carry much weight as an argument for me. Do you have actual evidence of your longer than 10 years timelines view?
I hold the view that important scientific advancements tend to come disproportionately from the very smartest and most thoughtful people. My hope would be that students smart enough to be meaningfully helpful on the AGI alignment problem would be able to think through and form correct inside views on this.
By fringe I just meant it's an uncommon view not that it is right or wrong. In the absence of actually having evaluated people's views well enough, it does make sense to give more popular views more prior probability of being true. (This is what I personally am doing as well)
My hope would be that students smart enough to be meaningfully helpful on the AGI alignment problem would be able to think through and form correct inside views on this.
Cool, thanks. Sorry for sounding a bit hostile, I'm just really freaked out by my strongly held inside view that we have less than 10 years until some really critical tipping point stuff happens.
I'm trying to be reasonable and rational about this, but sometimes I react emotionally to comments that seem to be arguing for a 'things will stay status quo for a good while, don't worry about the short term ' view.
How do you think people should do this?
Akash - very nice post, and helpful for (relative) AI alignment newbies like me.
I would add that many AI alignment experts (and many EAs, actually) seem to assume that everyone getting interested in alignment is in their early 20s and doesn't know much about anything, with no expertise in any other domain.
This might often be true, but there are also some people who get interested in alignment who have already had successful careers in other fields, and who can bring new interdisciplinary perspectives that alignment research might lack. Such people might be a lot less likely to fall into traps 3, 4, 5, and 6 that you mention. But they might fall into other kinds of traps that you don't mention, such as thinking 'If only these alignment kids understood my pet field X as well as I do, most of their misconceptions would evaporate and alignment research would progress 5x faster....' (I've been guilty of this on occasion).
+1!
I do wish EA (not just AI safety folk) wrote more about the ideas they've dismissed and all the cause areas and project ideas they've dismissed.
If no one mentions your pet idea its easier to fall into the trap of thinking it's a great idea nobody is seeing, or thinkjng there's motivated reasoning preventing others from seeing or admitting it, and so on.
Whereas if they explicitly mention it and say it's not worth it at the margin, now you can atleast discuss and contest that claim. At the very least you know you're not the first person to come up with it, and some thinking has already gone into it.
In my perspective, new and useful innovations in the past, especially in new fields, came from people with a wide and deep education and skillset that takes years to learn; and from fragmented research where no-one is necessarily thinking of a very high level terminal goal.
How sure are you that advice like "don't pursue proxy goals" or "don't spend years getting a degree" are useful for generating a productive field of AI alignment research, and not just for generating people who are vaguely similar to existing researchers who are thought of as successful? Or who can engage with existing research but will struggle with stepping outside its box?
After all:
I like this, it's a higher resolution description of what I think of as "not staying stuck in other people's boxes", or at least part of it. I plan to try avoiding these in the next few months, and live blog about it, so that people can correct mistakes I make early on, as opposed to me keeping doing wrong things for long. Any chance you'd like to follow?
I'd argue that if they solved inner alignment totally, then the rest of the alignment problems becomes far easier if not trivial to solve.
What do you feel are the downsides of someone spending, say, 2 years more in skilling up than they actually should? Besides grantmaker money paying for salary (which I agree is valuable and worth preserving), I'm not sure I see a big downside. 2 years for instance is a short span of time relative to the lifetime you could devote to study, it's also small relative to AI timelines (unless your timelines are really 2035 median or closer).
If we've got maybe 2-3 years left before AGI, then 2 years before starting is indeed a large percentage of that remaining time. Even if we have more like 5-10... maybe better to just starting trying to work directly on the problem as best you can than let yourself get distracted by acquiring general background knowledge.
I agree this is true if real timelines were 10 years or shorter. But timelines 10 years or shorter is a fringe view, and students who haven't formed strong inside views on timelines typically won't have a reason to defer to them at the expense of deferring to everyone else who has longer than 10 year timelines.
Calling my strongly held inside view 'fringe' doesn't carry much weight as an argument for me. Do you have actual evidence of your longer than 10 years timelines view?
I hold the view that important scientific advancements tend to come disproportionately from the very smartest and most thoughtful people. My hope would be that students smart enough to be meaningfully helpful on the AGI alignment problem would be able to think through and form correct inside views on this.
Hey
By fringe I just meant it's an uncommon view not that it is right or wrong. In the absence of actually having evaluated people's views well enough, it does make sense to give more popular views more prior probability of being true. (This is what I personally am doing as well)
This seems reasonable to believe
Cool, thanks. Sorry for sounding a bit hostile, I'm just really freaked out by my strongly held inside view that we have less than 10 years until some really critical tipping point stuff happens. I'm trying to be reasonable and rational about this, but sometimes I react emotionally to comments that seem to be arguing for a 'things will stay status quo for a good while, don't worry about the short term ' view.
No need to apologize, I could have been more considerate too! That does sound like a scary place to be.