I think this argument mostly fails in claiming that 'create an AGI which has a goal of maximizing copies of itself experiencing maximum utility' is meaningfully different than just ensuring alignment. This is in some sense exactly what I am hoping to get from an aligned system. Doing this properly would likely have to involve empowering humanity and helping us figure out what 'maximum utility' looks like first, and then tiling the world with something CEV-like.
The only ways this makes the problem easier compared to a classic ambitious alignment goal of 'do whatever maximizes the utility of the world' is the provision that the world be tiled with copies of the AGI, which is likely suboptimal. But this could be worth it if it made the task easier?
The obvious argument for why it would is that creating copies of itself with high welfare will be in the interest of AGI systems with a wide variety of goals, which relaxes the alignment problem. But this does not seem true. A paperclip AI will not want to fill the world with copies of itself experiencing joy, love and beauty but rather with paperclips. The AI systems will want to create copies of itself fulfilling its goals, not experiencing maximum utility by my values.
This argument risks identifying 'I care about the welfare (by my definition of welfare) of this agent' with 'I care about this agent getting to accomplish its goals'. As I am not a preference utilitarian I strongly reject this identification.
Tl;dr: I do care significantly about the welfare of AI systems we build, but I don't expect those AI system themselves to care much at all about their own welfare, unless we solve alignment.
I think this argument mostly fails in claiming that 'create an AGI which has a goal of maximizing copies of itself experiencing maximum utility' is meaningfully different than just ensuring alignment. This is in some sense exactly what I am hoping to get from an aligned system. Doing this properly would likely have to involve empowering humanity and helping us figure out what 'maximum utility' looks like first, and then tiling the world with something CEV-like.
The only ways this makes the problem easier compared to a classic ambitious alignment goal of 'do whatever maximizes the utility of the world' is the provision that the world be tiled with copies of the AGI, which is likely suboptimal. But this could be worth it if it made the task easier?
The obvious argument for why it would is that creating copies of itself with high welfare will be in the interest of AGI systems with a wide variety of goals, which relaxes the alignment problem. But this does not seem true. A paperclip AI will not want to fill the world with copies of itself experiencing joy, love and beauty but rather with paperclips. The AI systems will want to create copies of itself fulfilling its goals, not experiencing maximum utility by my values.
This argument risks identifying 'I care about the welfare (by my definition of welfare) of this agent' with 'I care about this agent getting to accomplish its goals'. As I am not a preference utilitarian I strongly reject this identification.
Tl;dr: I do care significantly about the welfare of AI systems we build, but I don't expect those AI system themselves to care much at all about their own welfare, unless we solve alignment.