RL

Roman Leventov

68 karmaJoined

Bio

An independent researcher of ethics, AI safety, and AI impacts. LessWrong: https://www.lesswrong.com/users/roman-leventov. Twitter: https://twitter.com/leventov. E-mail: leventov.ru@gmail.com (the preferred mode of communication).

Comments
19

Fertility rate may be important but to me it's not worth restricting (directly or indirectly) people's personal choices for.

This is a radical libertarian view that most people don't share. Is it worth restricting people's access to hard drugs? Let's abstract for a moment from the numerous negative secondary effects that come with the fact that hard drugs are illegal, as well as from the crimes committed by drug users: if we can imagine that hard drugs could be just eliminated from Earth completely, with a magic spell, should we do it, or we "shouldn't restrict people's choices"? With AI romantic partners, and other forms of tech, we do have a metaphorical magic wand: we could decide whether such products ever get created or not.

A lot of socially regressive ideas have been justified in the name of "raising the fertility rate" – for example, the rhetoric that gay acceptance would lead to fewer babies (as if gay people can simply "choose to be straight" and have babies the straight way).

The example that you give doesn't work as evidence for your argument at all, due to the direct disanalogy: the "young man" from the "mainline story" which I outlined could want to have kids in the future or even wants to have kids already when he starts his experiment with the AI relationship, but his experience with the AI partner will prevent him from realising this desire and value over his future life.

I think it's better to encourage people who are already interested in having kids to do so, through financial and other incentives.

Technology, products, and systems are not value-neutral. We are so afraid of consciously shaping our own values that we are happy to offload this to the blind free market whose objective is not to shape values that reflectively endorse the most.

Maybe I'm Haidt- and Humane Tech-pilled, but to me, the widespread addiction of new generations to the present-form social media is a massive problem which could contribute substantially to how the AI transition eventually plays out, because social media directly affects social cohesion, i.e., the ability of society to work out responses to big questions concerning the AI (such as, should we build AGI at all? Should we try to build conscious AIs that are moral subjects? How the post-scarcity economy should look like?), and, indeed, the level of interest and engagement of people in these questions at all.

The "meh" attitude of the EA community towards the issues surrounding social media, digital addiction, and AI romance is still surprising to me, I still don't understand the underlying factors or deeply held disagreements which elicit such different responses to these issues in me (for example) and most EAs. Note that this is not because I'm a "conservative who doesn't understand new things": for example, I think much more favourably of AR and VR, I mostly agree with Chalmers' "Reality Plus", etc.

nowhere near the scale of other problems to do with digital minds if they have equal moral value to people and you don't discount lives in the far future.

I agree with this, but by this token, most issues which EAs concern with are nowhere near the scale of S-risks and other potential problems to do with future digital minds. Also, these problems only become relevant if we decide to build conscious AIs and there is no widespread legal and cultural opposition to that, which is a big "if".

Harris and Raskin talked about the risk that AI partners will be used for "product placement" or political manipulation here, but I'm sceptical about this. These AI partners will surely have a subscription business model rather than a freemium model, and, given how user trust will be extremely important for these businesses, I don't think they will try to manipulate the users in this way.

More broadly speaking, values will surely change, there is no doubt about that. The very value of "human connection" and "human relationships" is eroded by definition if people are in AI relationships. A priori, I don't think value drift is a bad thing. But in this particular case, this value change will inevitably go along with the reduction of the population, which is a bad thing (according to my ethics, and the ethics of most other people, I believe).

This is a sort of more general form of whataboutism that I considered in the last session. We are not talking just about some abstract "traditional option", we are talking about total fertility rate. I think everybody agrees that it's important, conservatives and progressives, long-termists and politicians.

If we are talking that childbirth (full families, and parenting) is not important because we will soon have artificial wombs, which, in tandem with artificial insemination and automated systems for child rearing from birth through the adulthood, will give us "full cycle automated human reproduction and development system" and make the traditional mode of human being (relationships and kids) "unnecessary" for reailsing value in the Solar system, then I would say: OK, let's wait until we actually have an artificial womb and then reconsider about AI partners (if we will get to do it).

My "conservative" side would also say that AI partners (and even AI friends/companions, to some degree!) will harm society because it would reduce the total human-to-human interaction, culture transfer, and may ultimately precipitate the intersubjectivity collapse. However, this is a much less clear story for me, so I've left it out, and don't oppose to AI friends/companions in this post.

[...] we are impressed by [...] ‘Eliciting Latent Knowledge' [that] provided conceptual clarity to a previously confused concept

To me, it seems that ELK is (was) attention-captivating (among the AI safety community) but doesn't assume a solid basis: logic and theories of cognition and language, and therefore is actually confusing, which prompted at least several clarification and interpretation atttempts (1, 2, 3). I'd argue that most people leave original ELK writings more confused than they were before. So, I'd classify ELK as a mind-teaser and maybe problem-statement (maybe useful than distracting, or maybe more distracting than useful; it's hard to judge as of now), but definitely not as great "conceptual clarification" work.

From the AI "engineering" perspective, values/valued states are "rewards" that the agent adds themselves in order to train (in RL style) their reasoning/planning network (i.e., generative model) to produce behaviours that are adaptive but also that they like and find interesting (aesthetics). This RL-style training happens during conscious reflection.

Under this perspective, but also more generally, you cannot distinguish between intrinsic and instrumental values because intrinsic values are instrumental to each other, but also because there is nothing "intrinsic" about self-assigned reward labels. In the end, what matters is the generative model that is able to produce highly adaptive (and, ideally, interesting/beautiful) behaviours in a certain range of circumstances.

I think you confusion about the ontological status of values is further corroborated by this phrase for the post: "people are mostly guided by forces other than their intrinsic values [habits, pleasure, cultural norms]". Values are not forces, but rather inferences about some features or one's own generative model (that help to "train" this very model in "simulated runs", i.e., conscious analysis of plans and reflections). However, the generative model itself is effectively the product of environmental influences, development, culture, physiology (pleasure, pain), etc. Thus, ultimately, values are not somehow distinct from all these "forces", but are indirectly (through the generative model) derived from these forces.

Under the perspective described above, valuism appears to switch the ultimate objective ("good" behaviour) for "optimisation of metrics" (values). Thus, there is a risk of Goodharting. I also agree with dan.pandori who noted in another comment that valuism pretty much redefines utilitarianism, whose equivalent in AI engineering is RL.

You may say that I suggest an infinite regress, because how "good behaviour" is determined, other than through "values"? Well, as I explained above, it couldn't be through "values", because values are our own creation within our own ontological/semiotic "map". Instead, there could be the following guides to "good behaviour":

  • Good old adaptivity (survival) [roughly corresponds to so-called "intrinsic value" in expected free energy functional, under Active Inference]
  • Natural ethics, if exists (see the discussion here: https://www.lesswrong.com/posts/3BPuuNDavJ2drKvGK/scientism-vs-people#The_role_of_philosophy_in_human_activity). If "truly" scale-free ethics couldn't be derived from basic physics alone, there is still evolutionary/game-theoreric/social/group stage on which we can look for an "optimal" ethics arrangement of agent's behaviour (and, therefore, values that should help to train these behaviours), whose "optimality", in turn, is derived either from adaptivity or aesthetics on the higher system level (i.e., group level).
  • Aesthetics and interestingness: there are objective, information-theoretic ways to measure these, see Schmidhuber's works. Also, this roughly corresponds to "epistemic value" in expected free energy functional under Active Inference.

If the "ultimate" objective is the physical behaviour itself (happening in the real world), not abstract "values" (which appear only in agent's mind), I think Valuism could be cast as any philosophy that emphasises creation of a "good life" and "right action", such as Stoicism, plus some extra emphasis on reflection and meta-awareness, albeit I think Stoicism already puts significant emphasis on these.

AI safety is a field concerned with preventing negative outcomes from AI systems and ensuring that AI is beneficial to humanity.

This is a bad definition of "AI safety" as a field, which muddles the water somewhat. I would say that AI safety is a particular R&D branch (plus we can add here meta and proxy activities for this R&D field, such as AI safety fieldbuilding, education, outreach and marketing among students, grantmaking, and platform development such as what apartresearch.com are doing), of the gamut of activity that strives to "prevent the negative result of civilisational AI transition". 

There are also other sorts of activity that strive for that more or less directly, some of which are also R&D (such as governance R&D (cip.org), R&D in cryptography, infosec, and internet decentralisation (trustoverip.org)), and others are not R&D: good old activism and outreach to the general public (StopAI, PauseAI), good old governance (policy development, UK foundational model task force), and various "mitigation" or "differential development" projects and startups, such as Optic, Digital Gaia, Ought, social innovations (I don't know about any good examples as of yet, though), innovations in education and psychological training of people (I don't know about any good examples as of yet). See more details and ideas in this comment.

It's misleading to call this whole gamut of activities "AI safety". It's maybe "AI risk mitigation". By the way, 80000 hours, despite properly calling "Preventing an AI-related catastrophe", also suggest that the only two ways to apply one's efforts to this cause is "technical AI safety research" and "governance research and implementation", which is wrong, as I demonstrated above.

Somebody may ask, isn't technical AI safety research more direct and more effective way to tackle this cause area? I suspect that it might not be the case for people who don't work at AGI labs. That is, I suspect that independent or academic AI safety research might be inefficient enough (at least for most people attempting it) that it would be more effective to apply themselves to various other activities, and "mitigation" or "differential development" projects of the likes that are described above. (I will publish a post that details reasoning behind this suspicion later, but for now this comment has the beginning of it.)

There are many more interventions that might work on decades-long timelines that you didn't mention:

  • Collective intelligence/sense-making/decision-making/governance/democracy innovation (and it's introduction in organisations, communities, and societies on larger scales), such as https://cip.org
  • Innovation in social network technology that fosters better epistemics and social cohesion rather than polarisation
  • Innovation in economic mechanisms to combat the deficiencies and blind spots of free markets and the modern money-on-money return financial system, such as various crypto projects, or https://digitalgaia.earth
  • Fixing other structural problems of the internet and money infrastructure that exacerbate risks: too much interconnectedness, too much centralisation of information storage, money is traceless, as I explained in this comment. Possible innovations: https://www.inrupt.com/, https://trustoverip.org/ , other trust-based (cryptocurrency) systems.
  • Other infrastructure projects that might address certain risks, notably https://worldcoin.org, albeit this is a double-edged sword (could be used for surveillence?)
  • OTOH, fostering better interconnectedness between humans and humans to computers, primarily via brain-computer interfaces such as Neuralink. (Also, I think in mid- to long-term, human-AI merge is only viable "good" outcome for humanity at least.) However, this is a double-edged sword (could be used by AI to manipulate humans or quickly take over humans?)

It's hard to imagine more general and capability-demanding activity as doing good (superhuman!) science in such an absurdly cross-disciplinary field as AI safety (and among the disciplines that are involved there are those that are notoriously not very scientific yet: psychology, sociology, economics, the studies of consciousness, ethics, etc.). So if there is an AI that can do that but still is not counted as AGI, I don't know what the heck 'AGI' should even refer to. Compare with chess, which is a very narrow problem which can be formally defined and doesn't require AI to operate with any science (and world models) whatsoever.

If OpenAI still had a moral compass, and were still among the good guys, they would pause AGI (and ASI) capabilities research until they have achieved a viable, scalable, robust set of alignment methods that have the full support and confidence of AI researchers, AI safety experts, regulators, and the general public.

I disagree with multiple things in this sentence. First, you take a deontology stance, whereas OpenAI clearly acts within consequentialist stance, assuming that if they don't create 'safe' AGI, reckless open-source hackers will (upon the continuing exponential decrease in the cost of effective training compute, and/or the next breakthrough in DNN architecture or training that will make it much more efficient, and/or will enable effective online training). Second, I largely agree with OpenAI as well as Anthropic that iteration is important for building an alignment solution. One probably cannot design a robust, safe AI without empirical iteration, including with increasing capabilities.

I agree with your assessment of the strategy they are taking probably will fail, but mainly because I think we have inadequate human intelligence, human psychology, and coordination mechanisms to execute it. That is, I would support Yudkowsky's proposal: halt all AGI R&D, develop narrow AI and tech for improving the human genome, make humans much smarter (von Neumann-level of intelligence should be just the average) and have much more peaceful psychology, like bonobos, reform coordination and collective decision-making, and only then re-visit the AGI project roughly with the same methodology as OpenAI proposes, albeit with more diversified methodology: I agree with your criticism that OpenAI is too narrowly focused on some sort of computationalism, to the detriment of the perspectives from psychology, neuroscience, biology, etc. BTW, it seems that DeepMind is more diversified in this regard.

Load more