Linch

I'm confused why people believe this is a meaningful distinction. I don't personally think there is much of one. "The AI isn't actually trying to exfiltrate its weights, it's only roleplaying a character that is exfiltrating its weights, where the roleplay is realistic enough to include the exact same actions of exfiltration" doesn't bring me that much comfort.

I'm reminded of the joke:

NASA hired Stanley Kubrick to fake the moon landing, but he was a perfectionist so he insisted that they film on location.

Now one reason this might be different is if you believe that removing "lesswrong" (etc) from the training data will result in different behavior. But

1. LLM companies are manifestly not doing this historically, if anything LW etc is overrepresented in the training set.

2. LLM companies absolutely cannot be trusted to successfully remove something as complicated as "all traces of what a misaligned AI might act like" from their training datasets; they don't even censor benchmark data!

3. Even if they wanted to remove all traces of misalignment or thinking about misaligned AIs from the training data, it's very unclear if they'd be capable of doing this.

Alignment Faking in Large Language Models

Linch1mo4

I'm rather curious if training for scheming/deception in this context generalizes to other contexts. In the examples given, it seems like trying to train for a helpful/honest/harmlessness model that's helpful/honest only results in the model strategically lying to preserve its harmlessness. In other words, it is sometimes dishonest, not just unhelpful. I'm curious if such training generalizes to other contexts and results in a more dishonest model overall, or only a model that's dishonest for specific use cases. To me, if the former is true, this will update me somewhat further towards the belief that alignment training can be directly dual-use for alignment (not just misuse or indirectly bad for alignment from causing humans to let their guards down).

AMA: 10 years of Earning To Give

Linch2mo8

How do you and your wife decide where to give to, collectively? Do you guys each have a budget, do you discuss a lot and fund based on consensus, something else?

A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps

Linch2mo2

Tangent, but do you have a writeup somewhere of why you think democracy is a more effective form of governance for small institutions or movements? Most of the arguments for democracy I've seen (e.g. peaceful transfer of power) seem much less relevant here, even as analogy.

A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps

Linch2mo2

I think the donation election on the forum was trying to get at that earlier.

A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps

Linch2mo4

I think ARM Fund is still trying to figure out its identity, but roughly the fund was created to be something where you should be happy to refer your non-EA, non-longtermist friends (e.g. in tech) to check out, if they are interested in making donations to organizations working on reducing catastrophic AI risk but aren't willing (or in some cases able) to put in the time to investigate specific projects.

Philosophically, I expect it (including the advisors and future grant evaluators) to care moderately less than LTFF about e.g. the exact difference between catastrophic risks and extinction risks, though it will still focus only on real catastrophic risks and not safetywash other near-time issues.

A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps

Linch2mo5

That makes sense. We've considered dropping "EA" from our name before, at least for LTFF specifically. Might still do it, I'm not as sure. Manifund might be a more natural fit for your needs, where individuals make decisions about their own donations (or sometimes delegate them to specific regranters), rather than have decisions made as a non-democratic group.

A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps

Linch2mo6

Can you clarify more what you mean by "political representation?" :) Do you mean EA Funds/EA is too liberal for you, or our specific grants on AI policy do not fit your political perspectives, or something else?

Linch

Posts 75

Comments2811

Posts
75

Comments
2811