mic

cost savings
data privacy, not wanting usage to be tracked
interpretability research (this is EleutherAI's justification for releasing an open-source large language model)
wanting to do things that are prohibited by the API

Race to the Top: Benchmarks for AI Safety

mic3y4

For anyone interested, the Center for AI Safety is offering up to $500,000 in prizes for benchmark ideas: SafeBench (mlsafety.org)

Is space colonization desirable? Review of Dark Skies: Space Expansionism, Planetary Geopolitics, and the Ends of Humanity

mic3y4

Related: Risks of space colonization (Kovic, 2020).

Four Quotes on Preference Utilitarianism

mic3y2

Just so I understand, are all four of these quotes arguing against preference utilitarianism?

EA may look like a cult (and it's not just optics)

mic3y5

I'm curious whether the reason why EA may be perceived as a cult while, e.g., environmentalist and social justice activism are not, is primarily that the concerns of EA are much less mainstream.

I appreciate the suggestions on how to make EA less cultish, and I think they are valuable to implement, but I don't think they would have a significant effect on public perception of whether EA is a cult.

How/When Should One Introduce AI Risk Arguments to People Unfamiliar With the Idea?

Answer by micSep 30, 20224

I think AI Risk Intro 1: Advanced AI Might Be Very Bad is great.

AI alignment with humans... but with which humans?

mic3y3

I agree, that seems concerning. Ultimately, since the AI developers are designing the AIs, I would guess that they would try to align the AI to be helpful to the users/consumers or to the concerns of the company/government, if they succeed at aligning the AI at all. As for your suggestions "Alignment with whoever bought the AI? Whoever users it most often? Whoever might be most positively or negatively affected by its behavior? Whoever the AI's company's legal team says would impose the highest litigation risk?" – these all seem plausible to me.

On the separate question of handling conflicting interests: there's some work on this (e.g., "Aligning with Heterogeneous Preferences for Kidney Exchange" and "Aligning AI with Human Norms through Multi-Objective Reinforced Active Learning"), though perhaps not as much as we would like.

Effective altruism is no longer the right name for the movement

mic3y2

But I sometimes have a fear in the back of my mind that some of the attendees who are intrigued by these ideas are later going to look up effective altruism, get the impression that the movement’s focus is just about existential risks these days, and feel duped. Since EA pitches don’t usually start with longtermist ideas, it can feel like a bait and switch.

To avoid the feeling of a bait and switch, I think one solution is to introduce existential risk in the initial pitch. For example, when introducing my student group Effective Altruism at Georgia Tech, I tend to say something like: "Effective Altruism at Georgia Tech is a student group which aims to empower students to pursue careers tackling the world's most pressing problems, such as global poverty, animal welfare, or existential risk from climate change, future pandemics, or advanced AI." It's totally fine to mention existential risk – students still seem pretty interested and happy to sign up for our mailing list.

AI alignment with humans... but with which humans?

mic3y8

I think AI alignment isn't really about designing AI to maximize for the preference satisfaction of a certain set of humans. I think an aligned AI would look more like an AI which:

is not trying to cause an existential catastrophe or take control of humanity
has had undesirable behavior trained out or adversarially filtered
learns from human feedback about what behavior is more or less preferable
- In this case, we would hope the AI would be aligned to the people who are allowed to provide feedback
has goals which are corrigible
is honest, non-deceptive, and non-power-seeking

mic

Participation7

Posts 13

Comments279

Participation
7

Posts
13

Comments
279