Ozzie Gooen

10504 karmaJoined Berkeley, CA, USA

Bio

I'm currently researching forecasting and epistemics as part of the Quantified Uncertainty Research Institute.

Sequences
1

Amibitous Altruistic Software Efforts

Comments
979

Topic contributions
4

That roughly sounds right to me. 

I think that power/incentives often come first, then organizations and ecosystems shape their epidemics to some degree in order to be convenient. This makes it quite difficult what causally led to what. 

At the same time, I'm similarly suspicious of a lot of epistemics. It's obviously not just beliefs that OP likes that will be biased to favor convenience. Arguably a lot of these beliefs just replace other bad beliefs that were biased to favor other potential stakeholders or other bad incentives. 

Generally I'm quite happy for people and institutions to be quite suspicious of their worldviews and beliefs, especially ones that are incentivized by their surroundings. 

(I previously wrote about some of this in my conveniences post here, though that post didn't get much attention.)

Instead of "Goodharting", I like the potential names "Positive Alignment" and "Negative Alignment."

"Positive Alignment" means that the motivated party changes their actions in ways the incentive creator likes. "Negative Alignment" means the opposite.

Whenever there are incentives offered to certain people/agents, there are likely to be cases of both Positive Alignment and Negative Alignment. The net effect will likely be either positive or negative. 

"Goodharting" is fairly vague and typically just refers to just the "Negative Alignment" portion. 

I'd expect this to make some discussion clearer.
"Will this new incentive be goodharted?" -> "Will this incentive lead to Net-Negative Alignment?" 

Other Name Options

Claude 3.7 recommended other naming ideas like:

  • Intentional vs Perverse Responses
  • Convergent vs Divergent Optimization
  • True-Goal vs Proxy-Goal Alignment
  • Productive vs Counterproductive Compliance

This context is useful, thanks.

Looking back, I think this part of my first comment was poorly worded:
> I imagine that scientists will soon have the ability to be unusually transparent and provide incredibly low rates of fraud/bias, using AI.

I meant 
> I imagine that scientists will [soon have the ability to] be unusually transparent and provide incredibly low rates of fraud/bias], using AI.

So it's not that this will lead to low rates of fraud/bias, but that AI will help enable that for scientists willing to go along with it - but at the same time, there's a separate question of if scientists are willing to go along with it. 

But I think even that probably is not fair. A a better description of my beliefs is something like,

  • I think that LLM auditing tools could be useful for some kinds of scientific research for communities open to them.
  • I think in the short-term, sufficiently-motivated groups could develop these tools and use them to help decrease the levels of statistical and algorithmic accidents that happen. Correspondingly, I'd expect this to help with fraud. 
  • In the long-run, whenever AI approaches human-level intelligence (which I think will likely happen in the next 20 years, but I realize others disagree), I expect that more and more of the scientific process will be automated. I think there are ways this could go very well using things like AI auditing, whereby the results will be much more reliable than those currently made by humans. There are of course also worlds in which humans do dumb things with the AIs and the opposite happens. 
  • I think that at least, AI safety researchers should consider using these kinds of methods, and that the AI safety landscape should investigate efforts to make decent auditing tools."

My core hope with the original message is to draw attention to AI science auditing tools as things that might be interesting/useful, not to claim that they're definitely a major game changer. 

I think this is a significant issue, though I imagine a lot of this can be explained more by the fact that OP is powerful than that it is respected. 

If your organization is highly reliant on one funder, then doing things that funder regards as good is a major factor that will determine if you will continue to get funding, even if you might disagree. So it could make a lot of sense to update your actions towards that funder, more than would be the case if you had all the power.

I think that decentralizing funding is good insofar as the nonprofit gets either more power (to the extent that this is good) or better incentives. There are definitely options where one could get more funding, but that funding could come from worse funders, and then incentives decline.

Ultimately, I'd hope that OP and other existing funders can improve, and/or we get other really high-quality funders. 

This strikes comment strikes me as so different to my view that I imagine you might be envisioning a very specific implementation of AI auditors that I'm not advocating for. 

I tried having a discussion with an LLM about this to get some more insight, you can see this here if you like (though I suspect that you won't wind this useful, as you seem to not trust LLMs much at all.) It wound up suggesting implementations that could still provide benefits while minimizing potential costs.

https://claude.ai/share/4943d5aa-ed91-4b3a-af39-bc4cde9b65ef

The bigger issue here is with the "auditors" themselves: who's in charge of them? If a working scientist disagrees with what the "auditor" says, what happens? What happens if someone like Elon is in charge, and decides to use the auditors for a political crusade against "woke science", as is currently literally happening right now? 

I think this is a very sensible question.

My obvious answer is that the auditors should be held up to higher standards than the things they are auditing. This means that these should be particularly open, and should be open to other auditing. For example, the auditing code could be open-source, highly tested, and evaluated by both humans and AI systems. 

I agree that there are ways one could do a poor job with auditing. I think this is generally true for most powerful tools we can bring in - we need to be sure to use it well, else it could do harm.

On your other points - it sounds like you have dramatically lower expectations for AI than I do or much of the AI safety community does. I agree that if you don't think AI is very exciting, then AI-assisted auditing probably won't go that far. 

From my post:
> this could all be good experimentation on our way to systems that will oversee key AI progress. I ultimately want AI auditors for all risky AI development, but some of that will be a harder sell.

If it's the case that AI-auditors won't work, then I assume we wouldn't particularly need to oversee key AI progress anyway, as there's not much to oversee. 

I wasn't trying to make the argument that it would definitely be clear when this window closes. I'm very unsure of this. I also expect that different people have different beliefs, and that it makes sense for them to then take corresponding actions. 

As AI improves, there's a window for people to get involved and make changes regarding AI alignment and policy.

The window arguably starts small, then widens as it becomes clearer what to do.

But at some point it gets too close to TAI, I expect that the window narrows. The key decisions get made by a smaller and smaller group of people, and these people have less ability get help from others, given the quickening pace of things.

For example, at T minus 1 month, there might ultimately be a group of 10 people with key decision-making authority on the most powerful and dangerous AI project. The 'room where it happens' has become quite small.

This is somewhat similar to tech projects. An ambitious initiative will start with a few people, then slowly expand to hundreds. But over time decisions get locked into place. Eventually the project goes into "bug fixing" stage, then a marketing/release phase, after which the researchers will often get re-allocated. Later execs can decide to make decisions like killing the project.

One thing this means is that I expect that there could be a decent amount of time where many of us have "basically nothing to do" about AI safety, even though TAI still hasn't happened. I imagine it could still be good for many people to try to grow capital and influence other people in order to create positive epistemics/lock-in, but the key AI safety issues belong to a narrow group.

If it is the case that TAI will happen in 2 years, for example, I imagine very few people will be able to do much at all at this point, for the key aspects of AI alignment, especially if you're not actively working in the field.

Obviously, roles working on legislation with at 5+ time horizon will stop being relevant relevant over 5 years before TAI. And people working in tech at non-leading labs might not be relevant once it's clear these are non-leading labs.

(I don't mean to discourage people. Rather, I think it's important to realize when one should strive hard, and when one should chill out a bit and focus on other issues. Personally I'm sort of looking forward to the time where I'm extremely confident that I can't contribute much to the most major things. It's basically the part of the project where it's 'in someone else's hands'.)

I imagine that scientists will soon have the ability to be unusually transparent and provide incredibly low rates of fraud/bias, using AI. (This assumes strong AI progress in the next 5-20 years)

  • AI auditors could track everything (starting with some key things) done for an experiment, then flag if there was significant evidence of deception / stats gaming / etc. For example, maybe a scientist has an AI screen-recording their screen whenever it's on, but able to preserve necessary privacy and throw out the irrelevant data.
  • AI auditors could review any experimental setups, software, and statistics, and flag if it can detect any errors or not.
  • Over time, AI systems will be able to do large parts of the scientific work. We can likely make guarantees of AI-done-science that we can't with humans.

Such systems could hypothetically provide significantly stronger assurances than those argued for by some of the scientific reform communities today (the Open Science movement, for example).

I've been interested in this for some of QURI's work, and would love to see AI-overseen experimentation be done in the AI safety world.

Perhaps most important, this could all be good experimentation on our way to systems that will oversee key AI progress. I ultimately want AI auditors for all risky AI development, but some of that will be a harder sell.

Load more