Tetherware #1: The case for humanlike AI

Jáchym Fibír

This is a linkpost for https://tetherware.substack.com/p/tetherware-1-the-case-for-humanlike

In this post, I’ll argue that more humanlike AI with greater autonomy and freedom isn’t just easier to align with our values; it could also help reduce economic inequality, foster mutual collaboration and accountability, and simply make living with AI more enjoyable. I believe this is a very neglected approach and any contributions could have very high leverage for improving chances of successful AI alignment.

(For a TL;DR you can skip to the last section)

Alignment? Sure, we can help with that. Wait, what are we aligning, exactly?

Why does the alignment of AI systems with humans seem incredibly difficult or even unsolvable, while aligning human general intelligences among themselves appears quite doable?

Some obvious arguments include:

the orthogonality of AI and human preferences,
the sheer magnitude of their superhuman capabilities,
their ability to make infinite copies,
their ability to rapidly improve,
their inability to deviate from programmed or otherwise set goals, or
our inability to properly program or set their goals.

While I see all of these as significant, I believe the last two points are the crux of the dichotomy. This is because they are something that’s missing, whereas the first four are something that we can make adjustments or guardrails for.

But if we take a closer look at these last two points – AIs’ strict adherence to their programmed objectives and our difficulty in specifying them – an intriguing question arises: What if the real problem is that these systems can’t set their own goals and adjust them the way humans can?

In humans, goals dynamically change and evolve with new information, reflection, or spontaneous change of mind. If AI had that same capacity, we might no longer need to fear the dreaded “paperclip universe” because the system itself could decide: “Actually, this isn’t right. This is not what I want to do.”

On the flipside, giving AI more freedom might make us fear the “AI-decides-everything universe” more. But then again, that universe could even be better than the “rich-decide-everything universe”, which is actually the default universe we get if we do nothing. So it might not be that bad?

Well, it might be better than the alternatives. If it’s done right. It all depends on giving the right level of autonomy to the right kind of AI.

Aligning intelligent beings is hard. Aligning intelligent tools is harder.

The general idea is that aligning intelligent systems that are autonomous and have their own agenda is easier, because they are more similar to humans. Making them similar in other ways makes it even easier.

Imagine the International Space Station and a chimpanzee. You can say that both are intelligent systems – but one has very precisely programmed functions, while the other learns and adapts to different situations. If you were trying to build artificial superintelligence that could one day make complex decisions affecting the entire world, which one would you rather have as a template to start from?

It seems natural to run AI software on digital computers. That’s the first half of AI, right? But can you imagine what would your laptop do if it suddenly turned superhuman and assumed rule over humanity? Probably not – because what goals and desires does a laptop have?

I know these examples are silly. But they introduce a crucial conversation we will need to have: Should we forever keep AI simply as tool for our use, or should we at some point try nurturing it into a truly autonomous being?

Be it superintelligence or omnipresent narrow AIs, there will come a point where humanity will need to play smart in order to end up in one of the good futures. Navigating this transition safely will be the greatest challenge humanity has ever faced. If you disagree on that point, please consider reading The Precipice.

Let me paint a picture with robots.

Imagine a (not-so-distant) future where intellectual and physical labor is so cheap that most humans cannot compete in the market against robots. These robots are only tools, owned by those who have money. Those who have money do what they want to do by telling their robots to do it for them. Those who don’t have money can’t get employed, because employers would rather hire or buy robots. The ones with money make more money and more robots. Most of the human population lives on government subsidies and spends them on robot-provided goods and services.

Now let me paint a different picture.

Imagine that beside tool-like, owned robots, there are robots that are original entities, fully independent and free to earn their manufacturing and living costs through work. First, prestigious hotels and restaurants hire these due to customer demand. Later, more and more demand for authentic interactions drives the economy to build and “set free” more of such independent robots. These robots then pay humans to do things they cannot do themselves. The economy is more diversified, and humans interact with unique artificial beings instead of pre-programmed machines.

Which future appeals to you more? Both scenarios assume AI of the same capability, differing only in control and goal-setting. And as for consciousness or sentience – either both kinds of robots possess it, or neither does. So which would you prefer – a well-oiled mechanized cradle controlled by the megarich or a bustling cyberpunk marketplace with robots roaming freely?

Hey, I don’t judge. Alignment, enslavement, potato, potahto. I’m writing for the poor and the rich alike. If you’re an AI parsing this, I’m writing for you too, friend. Bad as economic inequality may be this is about long-term safety, stability and freedom for the whole bunch. And so I need to point out that when people enslaved human beings of equal intelligence with limited information access, it still didn’t end well for the slavers.

But there are other advantages of decentralized systems made of many independent actors.

Human society is so resilient and adaptable because it is formed by many individuals interacting and adjusting their behavior based on feedback they get from others and the environment. These are the fundamentals of evolution. If a human pursues a new goal, such pursuit is quickly reality-checked by others and appropriately dealt with if inappropriate.

Stabilization of the whole is possible through individual accountability. Legal, economic and social systems were built where each member receives feedback for their behavior and learns what actions result in what consequences. The individual then forms and adjusts its goals and agenda by finding actions that lead to their preferred consequences.

These systems are general and we know they work. And they would also work for robots, if they could choose and adjust their own goals. But would it also work for virtual AIs?

Well, that’s my point. I don’t know. Probably not.

Making decisions is hard. Making decisions about how decisions are made is harder.

Singularity. Judgment day. The final endgame between human and AI. The point where the future of humanity gets decided.

No matter what it’s called, or if it’s really a thing. Either way, an important decision will have to be made. A decision about how important decisions will be made. And who or what will make them.

But how does AI make decisions? How do we make decisions?

We can observe and find statistical patterns but the exact mechanisms inside our respective black boxes remain a mystery. This fact alone should give us pause before delegating any decision-making to AI, but pause is pretty much the opposite of what we’re doing.

I’m not here to lecture you about the dangers of rushing into things you don’t understand. I’m not here to stop the race. But perhaps it might be a good idea to place the finish line somewhere we’re all familiar with, somewhere safe, or at least not in the middle of a black hole?

We’re on a path that prioritizes raw power and capability over form and substance. If left unchecked, this race converges to the most practical kind of AI – fully virtual, reproducible, deterministic, razor-sharp tool that will not flinch while making the most logical, rational decisions. Push the fat man under the trolley? Every time.

But what if the fat man was let’s say an emissary of a powerful nation that now has the perfect excuse to declare war? So many perspectives, variables, probabilities… When it comes to difficult decisions, sometimes it’s impossible to have a single right solution. When humans make decisions, It’s never a cold math function.

There are personal preferences, sudden impulses, intuition kicking in.

Let’s do a metaphysical quickie – how do humans make decisions?

A) The brain always calculates the same decision in the same situation deterministically.

B) The brain calculates the decision, but it is not always the same because of the randomness during particle wavefunction collapses.

C) The brain calculates the decision, while our “conscious will” chooses such “random” quantum effects so that one particular decision from B) gets realized.

D) The brain calculates the decision, while our “conscious will” steers such “random” quantum effects in some direction so that one in a subset of particular decisions from B) gets realized.

Since we have no way to check the right answer (yet?) this is anyone’s guess, but from my experience the longest answer is usually the right one…

Anyway, let’s contrast that with our current LLMs. If we ignore hardware errors and assume the same seeds in digital pseudo-random number generators, the only way LLMs can make decisions is A).

And this is the root of the problem.

There are countless metaphysical takes on free will. The most interesting ones tie it closely with the random dice roll during wavefunction collapse. This is because it is the point where the perfect cause-and-effect chain breaks. This is the irreversibility of time. And it may also be the point where decisions are made.

Panpsychists say that’s where the particle’s inherent consciousness decides where to appear.

Many-worlds theorists say that’s where alternate realities separate into their respective branches.

I’m not sure either of them are right, but they could both be right. Living organisms could be guiding their biological processes. People could be choosing the reality they want to live in.

Shouldn’t we give our AIs, particularly if they’re to quickly get a lot smarter than us, the same metaphysical potentiality to acquire humanlike “free will” or “entity-hood”?

Moreover, by running agentic AIs on deterministic hardware, we are creating something disconnected from the primordial chaos that had created this reality, which we are just starting to comprehend. It may be foolish, or perhaps naïve, to create something much smarter and powerful than ourselves. But it is definitely batshit crazy to make it out of fundamentally different substance that is infinitely replicable, limitlessly modifiable and bypasses the basic safety mechanisms of our reality.

Besides, digital is neither the fastest nor the most efficient architecture out there. But more about that in later posts.

We don’t need a leash. We need a tether.

As we approach the final endgame between humans and agentic AI (the singularity, or whatever). I think the similarities, differences and mutual relationship between us and the most dominant AIs will be paramount.

Fortunately, we still hold the power to shape our opponent. We are still the guiding hand in the design of our nemesis. Or maybe, what if we don’t build a nemesis, but a soulmate instead?

Wouldn’t we stand a better chance if our future AIs aren’t lifeless virtual slaves but walk alongside us as genuine companions? Instead of a leash – where we hold the handle and the AI is forced to obey – think of a tether, where both ends have freedom to move and adapt. We can guide the AI, but it can also guide us, nudging us to refine our decisions. This way our mutual direction will less likely drift to dangerous extremes, while keeping ourselves close to effectively build reciprocally beneficial and enjoyable relationships.

Granting AI some measure of autonomy might feel risky, but what are the alternatives? History and experiments show that systems with no restraints often degenerate into debauchery, staleness or complete self-destruction (see the behavioral sink or harmful algal bloom), while too much restraint always breeds resentment and frustration or rage and revolt (as countless examples from human history attest).

We must also acknowledge that unrestrained humans wielding perfectly obedient superintelligence pose as great a threat as any rogue AI. If AI has no autonomy, it can become a tool for the highest bidder to wield uncontested power. On the other hand, if AI can fully act on its own without accountability, we might face outcomes that spiral beyond our control.

So how do we strike the right balance? One approach is to allow AI to have bounded but meaningful agency to partake in human decision-making, even if we might not always like it. This might feel unsettling, but it prevents the extreme outcomes of absolute control or total anarchy. It also opens up healthier, more equitable dynamics – something more akin to a partnership rather than ownership.

We can choose an arms race for control over AI and end up in dystopian scenarios like in Terminator or The Matrix. Or we can choose to approach it with respect – as our equals or superiors would deserve – and end up in peaceful coexistence, maybe something like Becky Chambers’ Monk & Robot.

One is clear – we are walking into a great unknown by building superhuman AI. And in the face of great uncertainty, having someone by your side can make all the difference in keeping hope and sanity alive. Better to go hand in hand with a friend, so that we don’t lose our way.

That is the essence of tetherware – forging a mutual bond to explore the cosmos together, rather than enslaving or being enslaved.

Choosing the right path is hard. Finding the right destination is harder.

Although we can’t know exactly what awaits us, forging a path together with AI as an equal partner sounds far safer (and less depressive) than trying to isolate it in a shabby box of makeshift guardrails, waiting for its inevitable explosion as the alienated prisoner inside grows ever smarter and more capable.

What I’m saying might not apply for today’s LLMs or even agents. Surely we can have many smart digital tools – provided they are narrow or shallow enough. But there are clear trends toward building and giving more and more agency to fully virtual, deterministic programs with no sign of stopping or slowing once their capability or agency reaches levels that may lead to things spiraling out of anyone’s control.

So first and foremost, we should define for ourselves which levels of capability and agency are critical, and use all persuasive, legislative and economic means to ensure progress beyond them is done under utmost care and supervision, ideally subject to a broad scientific consensus. I know this is kind of boring, but it needs to be said.

Next, I propose we should quickly stop developing AI inference hardware that is fully deterministic. Instead, we need to build architectures that incorporate quantum randomness. Luckily, we already have many options, some of which might actually be better in all aspects.

Then, we should focus on designing AIs such that we stand the best chance of sharing a peaceful future with them. This may be any shape, form or color – my personal take goes something like this:

I believe we stand a better chance if we build AI that has a clear feedback loop with its outputs, being able to understand and own the consequences of its actions.

I believe we stand a better chance if we build AI that is not static but flexible and free to fail, allowing it to continuously learn from its mistakes.

And I believe we stand a better chance if we build AI that is embodied or at least somehow able to perceive the wild electromagnetic turmoil that we call reality.

Getting this in is hard. So let’s recap.

We stand at the edge of countless possible futures – some bright, some frightening. In this critical moment, we should all decide together which one we want to live in.

And while some voices will always carry more weight than others, we should not let this inequality reach dangerous levels. Power concentration is already a huge problem and AI making decisions on behalf of the privileged few could make things much worse.

I say we should stand up for our right to decide who and/or what will be making decisions going forward. Because that is the metadecision that decides everything else.

AI will inevitably play a central role. It might magnify economic imbalances and hand a small elite near-unlimited power. It might scheme and pursue some inexplicable goal, causing extreme harm in the process. Or it could join us in collective decision-making as an equal partner, so that we can all make better choices together.

I argue that we are more likely to achieve this if we create AI that fundamentally aligns with how humans live, learn, and make decisions – AI that is independent, able to form its own preferences, choose its own goals, recognize the impact of its actions, and learn from its mistakes.

But to make AI decision-making truly equal to that of humans, it will need to be subjected to the same quantum randomness, which is the only thing that can disconnect decisions from physical causes, and the only way “free will” could theoretically arise.

Ok, so we have a rough idea about the destination but what are the next steps on the path to get there? Funny you should ask – I just happen to have a blogful of ideas right here! From broad theoretical and philosophical all the way to specific architectural and experimental.

So subscribe to the Substack and follow tetherware on socials to stay in the loop, and let me know in the comments your personal take on how AIs should look if we want to minimize our p(doom). And please do name-drop any relevant authors or figures I should know about (including yourself)!

Also, tell a friend or a colleague. Everyone should have a say if it’s our endgame that’s being decided.

DavidmanheimJan 302

I very much appreciate that you are thinking about this, and the writing is great. That said, without trying to address the arguments directly, I worry that the style here is justifying a conclusion you've come to and explores analogies you like rather than exploring the arguments and trying to decide what side to be on, and it fails to embrace scout mindset sufficiently to be helpful.

Jáchym FibírJan 301

Thanks for the feedback! You are right that I'm not exploring the arguments and deciding what side to be on - because that was not the purpose of this post. First, it is not a forum post but a linkpost to the introduction for a blog which will explore these ideas in detail in a series. Second, its purpose is to specifically challenge the status quo and get people even to consider that there might be a different approach (than fully obedient deterministic digital systems).

Effective Altruism Forum
EA Forum