JS

Jörn Stöhler

2 karmaJoined

Comments
2

My guess is that the crowds are similar and thus the surveys and the initial forecasts were also similar.

Iirc(?) the report states that there wasn't much updating of forecasts, so the final and initial average also are naturally close.

Besides that, there was also some deference to literature/group averages, and also some participants imitated e.g. the Carlsmith forecast but with their own numbers (I think it was 1/8th of my group, but I'd need to check my notes).

I kinda speculate that Carlsmith's model may be biased towards producing numbers around ~5% (sth about how making long chains of conditional probabilities doesn't work because humans fail to imagine each step correctly and thus end up biased towards default probabilities closer to 50% at each step).

I mostly back-chain from a goal that I'd call "make the future go well". This usually maps to value-aligning AI with broad human values, so that the future is full of human goodness and not tainted by my own personal fingerprints. Actually, ideally we first build an AI that we have the kind of control over so that the operators can make it do something that is less drastic than determining the entire future of humanity, e.g. slowing down AI progress to a halt until humanity pulls itself together and figures out more safe alignment techniques. That usually means making it corrigible or tool-like, instead of letting it maximize its aligned values.

So I guess I ultimately want (ii) but really hope we can get a form of (i) as an intermediate step.

When I talk about the "alignment problem" I usually refer to the problem that we by default get neither (i) nor (ii).