I'm a mathematician working on collective decision making, game theory, formal ethics, international coalition formation, and a lot of stuff related to climate change. Here's my professional profile.
My definition of value :
I need help with various aspects of my main project, which is to develop an open-source collective decision app, http://www.vodle.it :
I can help by ...
"targeting NNs" sounds like work that takes a certain architecture (NNs) as a given rather than work that aims at actively designing a system.
To be more specific: under the proposed taxonomy, where would a project be sorted that designs agents composed of a Bayesian network as a world model and an aspiration-based probabilistic programming algorithm for planning?
Where in your taxonomy does the design of AI systems go – what high-level architecture to use (non-modular? modular with a perception model, world-model, evaluation model, planning model etc.?), what type of function approximators to use for the modules (ANNs? Bayesian networks? something else?), what decision theory to base it on, what algorithms to use to learn the different models occurring in these modules (RL? something else?), how to curate training data, etc.?
Small remark regarding your the metric "* 100% minus the probability that the given technological restraint would have occurred without protests" (let's call the latter probability x): this seems to suggest that given the protests the probability became 100% while before it had been x and that hence the protests raised the probability from x to 100%. But the fact that the event eventually did occur does not mean at all that after the protests it had a probability of 100% of occurring. It could even have had the very same probability of occurring as before the protests, namely x, or even a smaller probability than that, if only x>0.
What you would actually want to compare here is the probability of occurring given no protests (x) and the probability of occurring given protests (which would have to be estimated separately).
In short: your numbers overestimate the influence of protests by an unknown amount.
So we're converging...
One final comment on your argument about odds: In our algorithms, specifying an allowable aspiration includes specifying a desired probability of success that is sufficiently below 100%. This is exactly to avoid the problem of fulfilling the aspiration becoming an optimization problem through the backdoor.
Dear Seth, thank you again for your opinion. I agree that many instrumental goals such as power would be helpful also for final goals that are not of the type "maximize this or that". But I have yet to see a formal argument that show that they would actually emerge in a non-maximizing agent just as likely as in a maximizer.
Regarding your other claim, I cannot agree that "mismatched goals is the problem". First of all, why do you think there is just a single problem, "the" problem? And then, is it helpful to consider something a "problem" that is an unchangeable fact of life? As long as there is more than one human who is potentially affected by an AI system's actions, and these humans' goals are not matched with each other (which they usually aren't), no AI system can have goals matched to all humans affected by it. Unless you want to claim that "having matched goals" is not a transitive relation. So I am quite convinced that the fact that AI systems will have mismatched goals is not a problem we can solve but a fact we have to deal with.
Dear Seth,
if Yonatan meant it the way you interpret it, I would still respond: Where is the evidence that such a reward function exists and guides humans' behavior? I spoke to several high-ranking scientists from psychology and social psychology who very much doubt this. I suspect that the theory of humans aiming to maximize reward functions might be a non-testable one, and in that sense "non-scientific" – you might believe in it or not. It helps explaining some stuff, but it is also misleading in other respects. I choose not to believe it until I see evidence.
I also don't agree that optimization is a red herring. It is a true issue, just not the only one, and maybe not the most severe one (if one believes one can separate out the relative severity of several interlinked issues, which I don't). I do agree that powerful agents are another big issue, whether competent or not. But powerful, competent, and optimizing agents are certainly the most scary kind :-)
Hi Seth, thank you for your thoughts!
I totally agree that it's just a start, and I hope to have made clear that it is just a start. If it was not sufficiently clear before, I have now added more text making explicit that of course I don't think that dropping the optimization paradigm is sufficient to make AI safe, just that it is necessary. And because if appears necessary and under-explored, I chose to study it for some time.
I don't agree with your 2nd point however: If an agent turns 10% of the world into paperclips, we might still have a chance to survive. If it turns everything into paperclips, we don't.
Regarding the last point:
What about EleutherAI?