I resonate a lot with this post. Thank you for writing it and giving an opportunity to people like me to express their thoughts on the topic. I'm writing with an anonymous account because publicly stating things like, 'I'm not sure it would be bad for humanity to be destroyed' seems dangerous for my professional reputation. I don't like not being transparent, but the risks here seem too great.
I currently work in an organization dedicated to reducing animal suffering. I've recently wondered a lot if I should go work on reducing x-risks from AI: it seems there's work where I could potentially be counterfactually useful in AI safety. But after having had about a dozen discussions with people from AI Safety field, I still don't have this gut feeling that reducing x-risks from AI is something that deserves my energy more than reducing animal suffering in the short-term.
I am not at all an expert on issues around AI, so take what follows as 'the viewpoint of someone outside the world of AI safety / x-risks trying to form an opinion on these issues, with the constraint of having a limited amount of time to do so'
The reasons are:
I am primarily a hedonistic utilitarian. For me to have a gut feeling that an action is worth taking, I need to be convinced that its expected value is high in this moral theory.
It does not seem clear to me that reducing x-risks from AI leads to scenarios with a better expected value than the default scenario.
I have the impression that the default scenario is an AI not aligned with what humanity wants, and that perhaps destroys humanity or destabilizes human societies greatly.
The scenario towards which work on AI safety/alignment leads us seems to be a scenario where AI is aligned with the values of humanity. I don't think humanity is actively seeking to make sentient beings suffer, but I think humanity shows very little consideration for the suffering of animals and even much of humanity itself. So I also guess that humanity wouldn't care a lot about digital minds suffering. So given the current values of humanity, it is not at all clear to me that this scenario is better than the scenario where AI 'destroys everything'.
The best possible scenario is the one where AI is aligned with 'good values' (maybe something that starts by being rather agnostic about what is good, but still with the idea that sentience is a very promising criterion, and which seeks out what is good. Once it is sufficiently sure that it knows what is good, it begins to optimize for it). I think that work on AI safety increases the probabilities that this scenario will occur, but that there is a risk that it mainly increases the probabilities that the previous scenario occurs. So, I am afraid that people in AI safety maximize the probability of the best possible scenario occurring, but do not maximize the overall expected value.
So, to me, the main question for knowing if work on AI safety is net positive is whether the intermediary scenario where AI is aligned with current humanity values, but not with 'Great values', is better than the scenario where AI is not aligned at all
The people working in AI safety with whom I discuss do not seem to have thought enough for my taste about what is above. It seems they place more weight than I do on the intrisical bad value of 'humanity goes extinct because of AI.' This makes me feel like I can't defer much to them in my decision-making.
More globally, I have a feeling that working on AI safety is more about "Ensuring the future is big" rather than "Ensuring the future is good"
Ultimately, the two questions I would like to answer are:
Is working on conventional AI Safety and / or Governance net positive in expectation if you mainly endorse hedonist utilitarianism?
If not, is there an other kind of work on AI that is robustly positive if you mainly endorse hedonist utilitarianism? Does it seem more cost-effective in this moral theory than "conventional" work for reducing animal suffering?
I resonate a lot with this post. Thank you for writing it and giving an opportunity to people like me to express their thoughts on the topic. I'm writing with an anonymous account because publicly stating things like, 'I'm not sure it would be bad for humanity to be destroyed' seems dangerous for my professional reputation. I don't like not being transparent, but the risks here seem too great.
I currently work in an organization dedicated to reducing animal suffering. I've recently wondered a lot if I should go work on reducing x-risks from AI: it seems there's work where I could potentially be counterfactually useful in AI safety. But after having had about a dozen discussions with people from AI Safety field, I still don't have this gut feeling that reducing x-risks from AI is something that deserves my energy more than reducing animal suffering in the short-term.
I am not at all an expert on issues around AI, so take what follows as 'the viewpoint of someone outside the world of AI safety / x-risks trying to form an opinion on these issues, with the constraint of having a limited amount of time to do so'
The reasons are:
Ultimately, the two questions I would like to answer are: