P

Prometheus

173 karmaJoined

Comments
30

The following is a conversation between myself in 2022, and a newer version of myself earlier this year.
 

On AI Governance and Public Policy

2022 Me: I think we will have to tread extremely lightly with, or, if possible, avoid completely. One particular concern is the idea of gaining public support. Many countries have an interest in pleasing their constituents, so if executed well, this could be extremely beneficial. However, it runs high risk of doing far more damage. One major concern is the different mindset needed to conceptualize the problem. Alerting people to the dangers of Nuclear War is easier: nukes have been detonated, the visual image of incineration is easy to imagine and can be described in detail, and they or their parents have likely lived through nuclear drills in school. This is closer to trying to explain someone the dangers of nuclear war before Hiroshima, before the Manhattan Project, and before even tnt was developed. They have to conceptualize what an explosion even is, not simply imagining an explosion at greater scale. Most people will simply not have the time or the will to try to grasp this problem, so this runs the risk of having people calling for action to a problem they do not understand, which will likely lead to dismissal by AI Researchers, and possibly short-sighted policies that don’t actually tackle the problem, or even make the problem worse by having the guise of accomplishment. To make matters worse, there is the risk of polarization. Almost any concern with political implications that has gained widespread public attention runs a high risk of becoming polarized. We are still dealing with the ramifications of well-intentioned, but misguided, early advocates in the Climate Change movement two decades ago, who set the seeds for making climate policy part of one’s political identity. This could be even more detrimental than a merely uninformed electorate, as it might push people who had no previous opinion on AI to advocate strongly in favor of capabilities acceleration, and to be staunchly against any form of safety policy. Even if executed using the utmost caution, this does not stop other players from using their own power or influence to hijack the movement and lead it astray.

2023 Me: Ah, Me’22,, the things you don’t know! Many of the concerns of Me’22 I think are still valid, but we’re experiencing what chess players might call a “forced move”. People are starting to become alarmed, regardless of what we say or do, so steering that in a direction we want is necessary. The fire alarm is being pushed, regardless, and if we don’t try to show some leadership in that regard, we risk less informed voices and blanket solutions winning-out. The good news is “serious” people are going on “serious” platforms and actually talking about x-risk. Other good news is that, from current polls, people are very receptive to concerns over x-risk and it has not currently fallen into divisive lines (roughly the same % of those concerned fall equally among various different demographics). This is still a difficult minefield to navigate. Polarization could still happen, especially with an Election Year in the US looming. I’ve also been talking to a lot of young people who feel frustrated not having anything actionable to do, and if those in AI Safety don’t show leadership, we might risk (and indeed are already risking), many frustrated youth taking political and social action into their own hands. We need to be aware that EA/LW might have an Ivory Tower problem, and that, even though a pragmatic, strategic, and careful course of action might be better, this might make many feel “shut out” and attempt to steer their own course. Finding a way to make those outside EA/LW/AIS feel included, with steps to help guide and inform them, might be critical to avoiding movement hijacking.

On Capabilities vs. Alignment Research:

2022 Me: While I strongly agree that not increasing capabilities is a high priority right now, I also question if we risk creating a state of inertia. In terms of the realms of safety research, there are very few domains that do not risk increasing capabilities research. And, while capabilities continues to progress every day, we might risk failing to keep up the speed of safety progress simply because every action risks an increase in capabilities. Rather than a “do no harm” principle, I think counterfactuals need to be examined in these situations, where we must consider if there is a greater risk if we *don’t* do research in a certain domain.

2023 Me: Oh, oh, oh! I think Me’22 was actually ahead of the curve on this one. This might still be controversial, but I think many got the “capabilities space” wrong. Many AIS-inspired theories that could increase capabilities are for systems that could be safer, more interpretable, and easier to monitor by default. And by not working on such systems we instead got the much more inscrutable, dangerous models by default, because the more dangerous models are easier. To quote the vape commercials, “safer != safe” but I still quit smoking in favor of electronics because safer is still at least safer. This is probably a moot point now, though, since I think it’s likely too late to create an entirely new paradigm in AI architectures. Hopefully Me’24 will be happy to tell me we found a new, 100% safe and effective new paradigm that everyone’s hopping on. Or maybe he’ll invent it.


 

Yann talked at the beginning how their difference in perspectives meant different approaches (open source vs. control/pause). I think a debate about that would have probably been much more productive. I wish someone had asked Melanie what policy proposals as a consequence of x-risk would be counter to policies for the 'short term' risks she spoke of, since her main complaint seemed to be that x-risk was "taking the oxygen out of the room", but I don't know concretely what concerns from x-risk would actually hurt short-term risks.

In terms of public perception, which is important, I think Yann and Bengio came across as more likable (which matters), while Max and Melanie several times interrupted other speakers, and seemed unnecessarily antagonistic toward the others' viewpoints. I love Max, and think he did overall the best in terms of articulating his viewpoints, but I imagine that put some viewers off.

  1. I would have said no a year ago, but a lot of people are now much more interested in AIS. I think there's a lot of potential for much more funding coming in. The binary greedy vs. non-greedy human sounds strange to me. What I can say is many EA types have the mentality of neglectedness, how they can individually have the most impact, etc. Many EAs would probably say they wouldn't be working on the things they were working on if enough other people were. This is great in isolation, and a mentality I usually hold, but it does have problems. The "greedy" humans have the mentality of "someone else is going to do this, I want to get there first." Individually, this doesn't change much. But if you multiple people doing this, you get people competing with each other, and usually they push each other to get to the outcome faster.
  2. Yes. But everyone's pushing hard on capabilities right now anyway. This has always been a problem in AIS. But we can't really do anything without running into this risk. But I think there's a big difference between employees at an org, and people starting orgs. I'd be fine with existing orgs attracting talent the way I mentioned, but I wouldn't want to throw money at someone (who's only interested in status) to start their own org. It's certainly tricky. Like, I can imagine how the leaders of an org can slowly get usurped. Holding current leaders in AIS in prestige can possibly mitigate the risk, where people with senior status in the field can function as "gatekeepers". Like, a young physicist who wants to gain clout, only for the sake of their own status, is still going to have to deal with senior members in the field who might call bs. If enough senior members call bs, that person loses status.

I don't think anyone can win a bidding war against OpenAI right now, because they've established themselves as the current "top dog". Even if some other company can pay them more, they'd probably still choose to work at OpenAI instead, just because they're OpenAI. But not everyone can work at OpenAI, so that still gives us a lot of opportunity. I don't think this would be much of a problem, as long as the metrics for success are set. As mentioned above, x gains in interpretability is something that can be demonstrated, and at that point it doesn't matter who does it, or why they do it. Other fields of alignment are harder to set metrics for, but there are still a good number of unsolved sub-problems that are demonstrable if solved. Set the metrics for success, and then you don't have to worry about value drift.

So I think there's a huge difference between other EA causes and AIS. You can probably accomplish a good number of other objectives in EA without these, but I still think trying to make them higher status might still be useful. It's a way of signaling what's important in society and what's valued. If I knew a way to make someone working in Pandemic Preparedness as high status as the NBA, I probably would.

That being said, AI is a different beast. Places like San Francisco are filled with people working 70+ hours a week, hungry to get ahead in someway in AI. I'd love to tap into that hunger, with the metric for success being Alignment. It would need to have actual metrics for success, though, like provably solving certain certain aspects of the problem, or making a huge discovery in interpretability. If someone can accomplish demonstrable huge gains in this, I don't really care what their personal motivations are.

Thank you for taking the time to read and critique this idea. I think this is very important, and I appreciate your thoughtful response.

Regarding how to get current systems to implement/agree to it, I don't think that will be relevant longterm. The mechanisms current institutions use for control I don't think can keep up with AI proliferation. I imagine most existing institutions will still exist, but won't have the capacity to do much once AI really takes off. My guess is, if AI kills us, it will happen after a slow-motion coup. Not any kind of intentional coup by AIs, but from humans just coup'ing themselves because AIs will just be more useful. My idea wouldn't be removing or replacing any institutions, but they just wouldn't be extremely relevant to it. Some governments might try to actively ban use of it, but these would probably be fleeting, if the network actually was superior in collective intelligence to any individual AI. If it made work economically more useful for them, they would want to use it. It doesn't involve removing them, or doing much to directly interfere with things they are doing. Think of it this way, recommendation algorithms on social media have an enormous influence on society, institutions, etc. Some try to ban or control them, but most can still access them if they want to, and no entity really controls them. But no one incorporates the "will of twitter" into their constitution.

The game board isn't any of the things you mention. All the things you mention I don't think have the capacity to do much to change the board. The current board is fundamentally adversarial, where interacting with it increases the power of other players. We've seen this with OpenAI, Anthropic, etc. The new board would be cooperative, at least at a higher level. How do we make the new board more useful than the current one? My best guess would be economic advantage of decentralized compute. We've seen how fast the OpenSource community has been able to make progress. And we've seen how a huge amount of compute gets used doing things like mining bitcoin, even though the compute is wasted on solving math puzzles. Contributing decentralized compute to a collective network could actually have economic value, and I imagine this will happen one way or another, but my concern is it'll end up being for the worse if people aren't actively trying to create a better system. A decentralized network with no safeguards would probably be much worse than anything a major AI company could create.

"But wouldn't the market be distorted by the fact that if everyone ends up dead, there is nobody left alive to collect their prediction-market winnings?"

This seems to be going back to the "one critical shot" approach which I think is a terrible idea that won't possibly work in the real world under any circumstances. This would be a progression overtime, not a case where an AI goes supernova overnight. This might require slower takeoffs, or at least no foom scenarios. Making a new board that isn't adversarial might mitigate the potential of foom. What I proposed was my first naive approach, and I've since thought that maybe it's the collective intelligence of the system that should be increasing, not a singleton AI being trained at the center. Most members in that collective intelligence would initially be humans, and slowly more and more AIs would be a more and more powerful part of the system. I'm not sure here, though. Maybe there's some third option where there's a foundational model at the lowest layer of the network, but it isn't a singular AI in the normal sense. I imagine a singular AI at the center could give rise to agency, and probably break the whole thing.

"It seems to me that having a prediction market for different alignment approaches would be helpful, but would be VERY far from actually having a good plan to solve alignment."

I agree here. They'd only be good at maybe predicting the next iteration of progress, not a fully scalable solution.

"I feel like we share many of the same sentiments -- the idea that we could improve the general level of societal / governmental decision-making using innovative ideas like better forms of voting, quadratic voting & funding, prediction markets, etc"

This would be great, but my guess is they would probably progress too slowly to be useful. Mechanism design that has to deal with currently existing institutions I don't think will happen quickly enough. Technically-enforced design might.

I love the idea of shovel-ready strategies, and think we need to be prepared in the event of a crisis. My issue is even most good strategies seem to just deal with large companies, and don't know how to deal with the likelihood that such power will fall into more and more actors.

Most of my experience is in the AI Safety sphere, and for that, I think perks and high salaries are critical. I'd love to see Alignment orgs with more of these things. The issue is we need high talent. And that high talent knows their worth, especially right now. If they can get Business Class working at Meta AI, I'd want to offer them First Class. If you have the money to make it happen, outbidding talent and being a talent attractor is important. Perks signal a job is high status. Retreats in luxurious locations signal high status. High status attracts high talent. I can't ask everyone with great talent to work on safety just out of the goodness of their hearts.

I generally agree, regarding the public at large. I'm speaking mostly from experience of people in the AIS Community speaking with people either working in AI or some related field, and I've found many often can get stuck debating these concepts. The general public seems to get more hung up on concepts like consciousness, sentience, etc. (from my experience)

Maybe the wording people found off=putting, but I think the point is correct. AIs haven't really started to get creative yet, which shouldn't be underestimated. Creativity is expanding the matrix of possibilities. In chess, that matrix remains constrained. Sure, there are physical constraints, but an ASI can run circles around us before it has to resort to reversing entropy.

The following is a conversation between myself in 2022, and a newer version of myself earlier this year.

On the Nature of Intelligence and its "True Name":

2022 Me:  This has become less obvious to me as I’ve tried to gain a better understanding of what general intelligence is. Until recently, I always made the assumption that intelligence and agency were the same thing. But General Intelligence, or G, might not be agentic. Agents that behave like RLs may only be narrow forms of intelligence, without generalizability. G might be something closer to a simulator. From my very naive perception of neuroscience, it could be that we (our intelligence) is not agentic, but just simulates agents. In this situation, the prefrontal cortex not only runs simulations to predict its next sensory input, but might also run simulations to predict inputs from other parts of the brain. In this scenario, “desire” or “goals”, might be simulations to better predict narrowly-intelligent agentic optimizers. Though the simulator might be myopic, I think this prediction model allows for non-myopic behavior, in a similar way GPT has non-myopic behavior, despite only trying to predict the next token (it has an understanding of where a future word “should” be within the context of a sentence, paragraph, or story). I think this model of G allows for the appearance of intelligent goal-seeking behavior, long-term planning, and self-awareness. I have yet to find another model for G that allows for all three. The True Name of G might be Algorithm Optimized To Reduce Predictive Loss.

2023 Me: interesting, me’22, but let me ask you something: you seem to think this majestic ‘G’ is something humans have, but other species do not, and then name the True Name of ‘G’ to be Algorithm Optimized To Reduce Predictive Loss. Do you *really* think other animals don’t do this? How long is a cat going to survive if it can’t predict where it’s going to land? Or where the mouse’s path trajectory is heading? Did you think it was all somehow hardcoded in? But cats can jump up on tables, and those weren’t in the ancestry environment, there’s clearly some kind of generalized form of prediction occurring. Try formulating that answer again, but taboo “intelligence”, “G”, “agent”, “desire”, and “goal”. I think the coherence of it breaks down.

Now, what does me’23 think? Well, I’m going to take a leaf from my own book, and try to explain what I think without the words mentioned above. There are predictive mechanisms in the Universe that can run models of what things in the Universe might do in future states. Some of these predictive mechanism are more computationally efficient than others. Some will be more effective than others. A more effective and efficient predictive mechanism, with a large input of information about the Universe, could be a very powerful tool. If taken to the theoretical (not physical) extreme, that predictive mechanism would hold models of all possible future states. It could then, by accident or intention, guide outcomes toward certain future states over others.

2022 Me: according to this model, humans dream because the simulator is now making predictions without sensory input, gradually creating a bigger and bigger gap from reality. Evidence to support this is from sensory-deprivation tanks, where humans, despite being awake, have dream-like states. I also find it interesting that people who exhibit Schizophrenia, which involves hallucinations (like dreams do), can tickle themselves. Most people can be tickled by others, but not themselves. But normal people on LSD can do this, and also can have hallucinations. My hairbrained theory is that something is going wrong when initializing new tokens for the simulator, which results in hallucinations from the lack of correction from sensory input, and a less strong sense of self because of a lack of correction from RL agents in other parts of the brain.

2023 Me: I don’t want to endorse crackpot theories from Me’22, so I’m just going to speak from feelings and fuzzy intuitions here. I will say hallucinations from chatbots are interesting. When getting one to hallucinate, it seems to be kind of “making up reality as it goes along”. You say it’s a Unicorn, and it will start coming up with explanations for why it’s a Unicorn. You say it told you something you never told it, and it will start acting as though it did. I have to admit it does have a strange resemblance to dreams. I find myself in New York, but remember that I had been in Thailand that morning, and I hallucinate a memory of boarding a plane. I wonder where I got the plane ticket, and I hallucinate another memory of buying one. These are not well-reasoned arguments, though, so I hope Me’24 won’t beat me up too much about them.

2022 Me: I have been searching for how to test this theory. One interest of mine has been mirrors. 

2023 Me: Don’t listen to Me’22 on this one. He thought he understood something, but he didn’t. Yes, the mirror thing is interesting in animals, but it’s probably a whole different thing, not the same thing.

Load more