Jobst Heitzig (vodle.it)

Aspiration-based, non-maximizing AI agent designs

· 2y ago · 1m read

Bob Jacobs 🔸

· 7mo ago · 46m read

Intergenerational equity and infinite-population ethics: a survey by Marcus Pivato and Marc Fleurbaey

Thought experiment: Trading off risk, intragenerational and intergenerational inequality, and fairness

· 8mo ago · 1m read

· 1y ago · 4m read

Resolving moral uncertainty with randomization

· 1y ago · 4m read

Bob Jacobs 🔸

· 2y ago · 12m read

Fair Collective Effective Altruism

What should my research lab focus on in the first week of 2023?

· 2y ago · 6m read

Decision making under model ambiguity, moral uncertainty, and other agents with free will?

· 2y ago · 1m read

Announcing vodle, a web app for consensus-aiming collective decisions

· 2y ago · 9m read

EA community values and attitudes survey / polls?

· 2y ago · 1m read

"Open Source AI" is a lie, but it doesn't have to be

· 2y ago · 1m read

Comments
51

Jobst Heitzig (vodle.it)7mo1

What about EleutherAI?

The privilege of native English speakers in reaching high-status, influential positions in EA

Jobst Heitzig (vodle.it)11mo4

Still, it might add more effort for the non-native speaker because a native speaker can identify something as jargon more easily. This is only a hypothesis of course, so to make progress in this discussion it might he helpful to review the literature on this.

Shallow review of live agendas in alignment & safety

Jobst Heitzig (vodle.it)1y1

What is OAA? And, more importantly: where now would you put it in your taxonomy?

Shallow review of live agendas in alignment & safety

Jobst Heitzig (vodle.it)1y1

"targeting NNs" sounds like work that takes a certain architecture (NNs) as a given rather than work that aims at actively designing a system.

To be more specific: under the proposed taxonomy, where would a project be sorted that designs agents composed of a Bayesian network as a world model and an aspiration-based probabilistic programming algorithm for planning?

Shallow review of live agendas in alignment & safety

Jobst Heitzig (vodle.it)1y1

Where in your taxonomy does the design of AI systems go – what high-level architecture to use (non-modular? modular with a perception model, world-model, evaluation model, planning model etc.?), what type of function approximators to use for the modules (ANNs? Bayesian networks? something else?), what decision theory to base it on, what algorithms to use to learn the different models occurring in these modules (RL? something else?), how to curate training data, etc.?

Efficacy of AI Activism: Have We Ever Said No?

Jobst Heitzig (vodle.it)1y4

Small remark regarding your the metric "* 100% minus the probability that the given technological restraint would have occurred without protests" (let's call the latter probability x): this seems to suggest that given the protests the probability became 100% while before it had been x and that hence the protests raised the probability from x to 100%. But the fact that the event eventually did occur does not mean at all that after the protests it had a probability of 100% of occurring. It could even have had the very same probability of occurring as before the protests, namely x, or even a smaller probability than that, if only x>0.

What you would actually want to compare here is the probability of occurring given no protests (x) and the probability of occurring given protests (which would have to be estimated separately).

In short: your numbers overestimate the influence of protests by an unknown amount.

So we're converging...

One final comment on your argument about odds: In our algorithms, specifying an allowable aspiration includes specifying a desired probability of success that is sufficiently below 100%. This is exactly to avoid the problem of fulfilling the aspiration becoming an optimization problem through the backdoor.

Dear Seth, thank you again for your opinion. I agree that many instrumental goals such as power would be helpful also for final goals that are not of the type "maximize this or that". But I have yet to see a formal argument that show that they would actually emerge in a non-maximizing agent just as likely as in a maximizer.

Regarding your other claim, I cannot agree that "mismatched goals is the problem". First of all, why do you think there is just a single problem, "the" problem? And then, is it helpful to consider something a "problem" that is an unchangeable fact of life? As long as there is more than one human who is potentially affected by an AI system's actions, and these humans' goals are not matched with each other (which they usually aren't), no AI system can have goals matched to all humans affected by it. Unless you want to claim that "having matched goals" is not a transitive relation. So I am quite convinced that the fact that AI systems will have mismatched goals is not a problem we can solve but a fact we have to deal with.

Dear Seth,

if Yonatan meant it the way you interpret it, I would still respond: Where is the evidence that such a reward function exists and guides humans' behavior? I spoke to several high-ranking scientists from psychology and social psychology who very much doubt this. I suspect that the theory of humans aiming to maximize reward functions might be a non-testable one, and in that sense "non-scientific" – you might believe in it or not. It helps explaining some stuff, but it is also misleading in other respects. I choose not to believe it until I see evidence.

I also don't agree that optimization is a red herring. It is a true issue, just not the only one, and maybe not the most severe one (if one believes one can separate out the relative severity of several interlinked issues, which I don't). I do agree that powerful agents are another big issue, whether competent or not. But powerful, competent, and optimizing agents are certainly the most scary kind :-)