In this post, I propose an idea that could improve whistleblowing efficiency, thus hopefully improving AI Safety by making unsafe practices discovered marginally faster.

I'm looking for feedback, ideas for improvement, and people interested in making it happen.

It has been proposed before, that it's beneficial to have an efficient and trustworthy whistleblowing mechanism The technology that makes it possible has become easy and convenient. For example, here is Proof of Organization, built on top of ZK Email: a message board that allows people owning an email address at their company's domain to post without revealing their identity And here is an application for ring signatures using GitHub SSH keys that allows creating a signature that proves that you own one of the keys from any subgroup you define (e.g., EvilCorp repository contributors)

However, as one may have guessed, it hasn't been widely used. Hence, when the critical moment arrives, the whistleblower may not be aware of such technology, and even if they were, they probably wouldn't trust it enough to use it. I think trust comes from either code being audited by a well-established and trusted entity or, more commonly - through practice (e.g., I don't need to verify that a certain password manager is secure if I know that millions are using it, and there haven't been any password breaches reported)

Hence, I was considering how to make a privacy-preserving communication tool that would be commonly used, demonstrating its legitimacy and becoming trusted

The best idea I have so far is to create a set of Twitter bots for each interesting company (or community), where only the people in question could post. Depending on the particular Twitter bot in question, access could be gated by ownership of a LinkedIn account, email domain, or, e.g., an LW/AI-Alignment forum account of a certain age

I imagine this could become viral and interesting in gossipy cases, like the Sam Altman drama or the Biden dropout drama.

Some questions that came up during consideration:

  • How to deal with moderation of the content (if everything is posted, anyone could deliberately post some profanity to get the bot banned)?
    • I would aggressively moderate myself and replace moderated posts with a link to a separate website where all posts get through
  • How do we balance convenience and privacy?
    • I'd make a hosted opensource tool, which I expect most people would feel content to use for any gossip case that doesn't put your job on the line but has instructions available to download it and run locally and submit posts through Tor, etc. for cases where such effort is warranted
  • What if people use this tool to make false accusations?
    • I do think this is an actual downside, but I hope that the benefits of the tool would be worth it
  • What if someone creates a fake dialogue, pretending to be two people debating a topic?
    • Although it's technically possible to make a tool that would allow proving that you have not posted before, this functionality shouldn't exist since. Otherwise, one can be forced to make such proof or confess. It is a thing to be aware of, but not too much of a problem, in my opinion

I'm curious to learn what others think and about other ideas for making a gossip/whistleblower tool that could become widely known and trusted.

11

0
0

Reactions

0
0

More posts like this

Comments1
Sorted by Click to highlight new comments since:

Very interesting idea. Definitely worth discussing.

Brainstorming more possible downsides: Disgruntled workers could grief their employers by leaking intellectual property, just to spite them. That could e.g. create an incentive to automate away humans with compliant AIs.

Browsing Blind could help with thinking up more possible downsides. Actually, the lightweight version of this idea might be to just browse Blind all day, and create a social media feed by screenshotting juicy AI gossip.

One bot per company could struggle to gain followers. Lots of bots would likely languish in obscurity. Perhaps try a single bot for everyone, or all Fortune 500 companies, or all companies in a particular industry (such as AI), etc.

I think putting the raw submission feed on a separate site makes sense. Anyone could make a "Hello world" post, to test things out and verify that the system works as intended, boosting credibility. Pick out interesting highlights from the unfiltered feed of posts, and share them on social media to bring attention to the service / to those particular posts.

Curated and popular this week
Relevant opportunities