AGI - alignment - paperclip maximizer - pause - defection - incentives

Mars Robertson

I would like to expose myself to critique.

I hope this is a place where I can receive some feedback + share some of the insights that came to me.

https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect - "people with low ability, expertise, or experience regarding a certain type of task or area of knowledge tend to overestimate their ability or knowledge"

I'm somewhere on the spectrum 🤡

1. AGI alignment metrics

To avoid paperclip maximizer and solving climate change by eliminating humans I suggest the following value: LIFE

I've embedded this principle into Network State Genesis and described it in the founding document in the following way:

1. No killing (universally agreed across legal systems and religions)
2. Health (including mental health, longevity, happiness, wellbeing)
3. Biosphere, environment, other living creatures
4. AI safety
5. Mars: backup civilization is fully aligned with the virtue of life preservation

These principles were applicable to the Network State and I think first three of them can be repurposed towards AGI alignment.

(another core belief is GRAVITY - I believe in GRAVITY - GRAVITY brought us together)

2. Pause, defection, incentives

New proof-of-X algorithm to ensure compliance with AI moratorium. Ensuring supercomputers are not used for training more powerful models.

Proof-of-work consumes loads of energy.

It could be a mixture of algorithms, that is more energy friendly:

peak power for a short amount of time (solve something complex quickly)
proof of storage

A mixture of different algorithms to ensure various elements of the data centre are yielded unsuitable for other means. I know too little about the challenges of operating a data center, I know too little about training AI, ultimately I do not know.

I’m just aware of the incentive to defect and no obvious way to enforce the rules.

So much easier to prove the existence of aliens.
So much more difficult to disprove.
So much easier to prove you did the thing.
So much more difficult to disprove.

1 Reactions

Comments2

Sorted by

New & upvoted

Click to highlight new comments since: Today at 8:06 AM

Mars RobertsonApr 13 20236

WOW

Something new dropped: https://twitter.com/FLIxrisk/status/1646539796527951872

Direct link to the policy: https://futureoflife.org/wp-content/uploads/2023/04/FLI_Policymaking_In_The_Pause.pdf

My reply: https://twitter.com/marsxrobertson/status/1646583463493992462

I'm deeply in "don't trust verify" camp.

Monitor the energy usage.

Climate change is for real and we need to cut the emissions anyway.

My assumption is: "it takes computer power to train the AI"

"Data centres are estimated to be responsible for up to 3% of global electricity consumption today and are projected to touch 4% by 2030." - https://datacentremagazine.com/articles/efficiency-to-loom-large-for-data-centre-industry-in-2023

Mars RobertsonApr 13 20231

A little bit more explanation / inspiration: https://en.wikipedia.org/wiki/Three_Laws_of_Robotics

1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.

2. A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.

3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

Another inspiration: https://earthbound.report/2018/01/15/elinor-ostroms-8-rules-for-managing-the-commons/

Laws of AI alignment:

Humans. Health. Mental Heath. Happiness. Wellbeing. Nature. Environment.

Buying us enough time to figure out what's next...

I guess there are not that many AI ethicists: https://forum.effectivealtruism.org/posts/5LNxeWFdoynvgZeik/nobody-s-on-the-ball-on-agi-alignment

What is the Shelling Point? This Forum? Less Wrong? Stack Overflow? Reddit? Some Twitter hashtag: https://twitter.com/marsxrobertson/status/1642235852997681153

Or maybe we can ask the AI?

AI may actually know what are the good principles 🤣