Jan_Kulveit

4116 karmaJoined Dec 2017

Bio

Studying behaviour and interactions of boundedly rational agents, AI alignment and complex systems.

Research fellow at Future of Humanity Institute, Oxford. Other projects: European Summer Program on Rationality. Human-aligned AI Summer School. Epistea Lab.

Posts
27

Sorted by New

147

Talking publicly about AI risk

Jan_Kulveit

· 3y ago

Why Simulator AIs want to be Active Inference AIs

Jan_Kulveit

· 3y ago · 9m read

The space of systems and the space of maps

Jan_Kulveit

· 3y ago · 6m read

Cyborg Periods: There will be multiple AI transitions

Jan_Kulveit

· 3y ago

Deontology and virtue ethics as "effective theories" of consequentialist ethics

Jan_Kulveit

· 3y ago · 12m read

113

We can do better than argmax

Jan_Kulveit

· 3y ago · 13m read

Limits to Legibility

Jan_Kulveit

· 3y ago · 6m read

167

Ways money can make things worse

Jan_Kulveit

· 3y ago · 11m read

Continuity Assumptions

Jan_Kulveit

· 3y ago · 5m read

101

Different forms of capital

Jan_Kulveit

· 4y ago · 2m read

Sequences
1

Learning from crisis

Comments
209

Talking publicly about AI risk

Jan_Kulveit3y2

Sorry for the delay in response.

Here I look at it from a purely memetic perspective - you can imagine thinking as a self-interested memplex. Note I'm not claiming this is the main useful perspective, or this should be the main perspective to take.

Basically, from this perspective

* the more people think about AI race, the easier is to imagine AI doom. Also the specific artifacts produced by AI race make people more worried - ChatGPT and GPT-4 likely did more for normalizing and spreading worried about AI doom than all the previous AI safety outreach together.

The more the AI race is clear reality people agree on, the more attentional power and brainpower you will get.

* but also from the opposite direction... : one of the central claim of the doom memplex is AI systems will be incredibly powerful in our lifetimes - powerful enough to commit omnicide, take over the world, etc. - and their construction is highly convergent. If you buy into this, and you are certain type of person, you are pulled toward "being in this game". Subjectively, it's much better if you - the risk-aware, pro-humanity player - are at the front. Safety concerns of Elon Musk leading to founding of OpenAI likely did more to advance AGI than all advocacy of Kurzweil-type accelerationist until that point...

Empirically, the more people buy into the "single powerful AI systems are incredibly dangerous", the more attention goes toward work on such system.

Both memeplexes share a decent amount of maps, which tend to work as blueprints or self-fullfilling prophecies for what to aim for.

EAG talks are underrated IMO

Jan_Kulveit3y5

Personally, I think the 1:1 meme is deeply confused.

A helpful analogy (thanks to Ollie Base) is with nutrition. Imagine someone hearing that "chia seeds are the nutritionally most valuable food, top rated in surveys" ... and subsequently deciding to eat just chia seeds, and nothing else!

In my view, sort of obviously, intellectual conference diet consisting just of 1:1s is poor and unhealthy for almost everyone.

Nobody’s on the ball on AGI alignment

Jan_Kulveit3y30

In my view this is a bad decision.

As I wrote on LW

Sorry but my rough impression from the post is you seem to be at least as confused about where the difficulties are as average of alignment researchers you think are not on the ball - and the style of somewhat strawmanning everyone & strong words is a bit irritating.

In particular I don't appreciate the epistemic of these moves together

1. Appeal to seeing thinks from close proximity. Then I got to see things more up close. And here’s the thing: nobody’s actually on the friggin’ ball on this one!
2. Straw-manning and weakmaning what almost everyone else thinks and is doing
3. Use of an emotionally compelling words like 'real science' for vaguely defined subjects where the content may be the opposite of what people imagine. Is the empirical alchemy-style ML type of research what's advocated for as the real science?
4. What overall sounds more like the aim is to persuade, rather than explain

I think curating this signals this type of bad epistemics is fine, as long as you are strawmanning and misrepresenting others in a legible way and your writing is persuasive. Also there is no need to actually engage with existing arguments, you can just claim seeing things more up close.

Also to what extent are moderator decisions influenced by status and centrality in the community...
... if someone new and non-central to the community came up with this brilliant set of ideas how to solve AI safety:
1. everyone working on it is not on the ball. why? they are all working on wrong things!
2. promising is to do something very close to how empirical ML capabilities research works
3. this is a type of problem where you can just throw money at it and attract better ML talent
... I doubt this would have a high chance of becoming curated.

Nobody’s on the ball on AGI alignment

Jan_Kulveit3y3

Copy-pasting here from LW.

Sorry but my rough impression from the post is you seem to be at least as confused about where the difficulties are as average of alignment researchers you think are not on the ball - and the style of somewhat strawmanning everyone & strong words is a bit irritating.

Maybe I'm getting it wrong, but it seems the model you have for why everyone is not on the ball is something like "people are approaching it too much from a theory perspective, and promising approach is very close to how empirical ML capabilities research works" & "this is a type of problem where you can just throw money at it and attract better ML talent".

I don't think these two insights are promising.

Also, again, maybe I'm getting it wrong, but I'm confused how similar you are imagining the current systems to be to the dangerous systems. It seems either the superhuman-level problems (eg not lying in a way no human can recognize) are somewhat continuous with current problems (eg not lying), and in that case it is possible to study them empirically. Or they are not. But different parts of the post seem to point in different directions. (Personally I think the problem is somewhat continuous, but many of the human-in-the-loop solutions are not, and just break down.)

Also, with what you find promising I'm confused what do you think the 'real science' to aim for is - on one hand it seems you think the closer the thing is to how ML is done in practice the more real science it is. On the other hand, in your view all deep learning progress has been empirical, often via dumb hacks and intuitions (this isn't true imo).

GPTs are Predictors, not Imitators

Jan_Kulveit3y23

(crossposted from Alignment Forum)

While the claim - the task ‘predict next token on the internet’ absolutely does not imply learning it caps at human-level intelligence - is true, some parts of the post and reasoning leading to the claims at the end of the post are confused or wrong.

Let’s start from the end and try to figure out what goes wrong.

GPT-4 is still not as smart as a human in many ways, but it's naked mathematical truth that the task GPTs are being trained on is harder than being an actual human.

And since the task that GPTs are being trained on is different from and harder than the task of being a human, it would be surprising - even leaving aside all the ways that gradient descent differs from natural selection - if GPTs ended up thinking the way humans do, in order to solve that problem.

From a high-level perspective, it is clear that this is just wrong. Part of what human brains are doing is to minimise prediction error with regard to sensory inputs. Unbounded version of the task is basically of same generality and difficulty as what GPT is doing, and is roughly equivalent to understand everything what is understandable in the observable universe. For example: a friend of mine worked at analysing the data from LHC, leading to the Higgs detection paper. Doing this type of work basically requires a human brain to have a predictive model of aggregates of outputs of a very large number of collisions of high-energy particles, processed by a complex configuration of computers and detectors.

Where GPT and humans differ is not some general mathematical fact about the task, but differences in what sensory data is a human and GPT trying to predict, and differences in cognitive architecture and ways how the systems are bounded. The different landscape of both boundedness and architecture can lead to both convergent cognition (thinking as the human would do) and the opposite, predicting what the human would output in highly non-human way.

The boundedness is overall a central concept here. Neither humans nor GPTs are attempting to solve ‘how to predict stuff with unlimited resources’, but a problem of cognitive economy - how to allocate limited computational resources to minimise prediction error.

Or maybe simplest:
Imagine somebody telling you to make up random words, and you say, "Morvelkainen bloombla ringa mongo."
Imagine a mind of a level - where, to be clear, I'm not saying GPTs are at this level yet -
Imagine a Mind of a level where it can hear you say 'morvelkainen blaambla ringa', and maybe also read your entire social media history, and then manage to assign 20% probability that your next utterance is 'mongo'.
The fact that this Mind could double as a really good actor playing your character, does not mean They are only exactly as smart as you.
When you're trying to be human-equivalent at writing text, you can just make up whatever output, and it's now a human output because you're human and you chose to output that.
GPT-4 is being asked to predict all that stuff you're making up. It doesn't get to make up whatever. It is being asked to model what you were thinking - the thoughts in your mind whose shadow is your text output - so as to assign as much probability as possible to your true next word.

If I try to imagine a mind which is able to predict my next word when asked to make up random words, and be successful at assigning 20% probability to my true output, I’m firmly in the realm of weird and incomprehensible Gods. If the Mind is imaginably bounded and smart, it seems likely it would not devote much cognitive capacity to trying to model in detail strings prefaced by a context like ‘this is a list of random numbers’, in particular if inverting the process generating the numbers would seem really costly. Being this good at this task would require so much data and cheap computation that this is way beyond superintelligence, in the realm of philosophical experiments.

Overall I think it is really unfortunate way how to think about the problem, where a system which is moderately hard to comprehend (like GPT) is replaced by something much more incomprehensible. Also it seems a bit of a reverse intuition pump - I’m pretty confident most people's intuitive thinking about this ’simplest’ thing will be utterly confused.

How did we got here?

A human can write a rap battle in an hour. A GPT loss function would like the GPT to be intelligent enough to predict it on the fly.

Apart from the fact that humans are also able to rap battle or impro on the fly, notice that “what would the loss function like the system to do” in principle tells you very little about what the system will do. For example, the human loss function makes some people attempt to predict winning lottery numbers. This is an impossible task for humans and you can’t say much about the human based on this. Or you can speculate about minds which would be able to succeed in this task, but you soon get into the realm of Gods and outside of physics.

Consider that sometimes human beings, in the course of talking, make errors.
GPTs are not being trained to imitate human error. They're being trained to *predict* human error.
Consider the asymmetry between you, who makes an error, and an outside mind that knows you well enough and in enough detail to predict *which* errors you'll make.

Again, from the cognitive economy perspective, predicting my errors would often be wasteful. With some simplification, you can imagine I make two types of errors - systematic, and random. Often the simplest way how to predict the systematic error would be to emulate the process which led to the error. Random errors are ... random, and a mind which knows me in enough detail to predict which random errors I’ll make seems a bit like the mind predicting the lottery numbers.

Consider that somewhere on the internet is probably a list of thruples: <product of 2 prime numbers, first prime, second prime>.
GPT obviously isn't going to predict that successfully for significantly-sized primes, but it illustrates the basic point:
There is no law saying that a predictor only needs to be as intelligent as the generator, in order to predict the generator's next token.

The general claim that some predictions are really hard and you need superhuman powers to be good at them is true, but notice that this does not inform us about what GPT-x will learn.

Imagine yourself in a box, trying to predict the next word - assign as much probability mass to the next token as possible - for all the text on the Internet.
Koan: Is this a task whose difficulty caps out as human intelligence, or at the intelligence level of the smartest human who wrote any Internet text? What factors make that task easier, or harder?

Yes this is clearly true: in the limit the task is of unlimited difficulty.

There are no coherence theorems

Jan_Kulveit3y10

You are correct with some of the criticism, but as a side-note, completeness is actually crazy.

All real agents are bounded, and pay non-zero costs for bits, and as a consequence, don't have complete preferences. Complete agents in real world do not exist. If they existed, correct intuitive model of them wouldn't be 'rational players' but 'utterly scary god, much bigger than the universe they live in'.

[Link] How effective altruists ignored risk

Jan_Kulveit3y11

a.
Sequoia led FTX round B in Jul 2021 and had notably more time to notice any irregularities than grant recipients.

b.
I would expect the funds to have much better expertise in something like "evaluating the financial health of a company".

Also it seem you are somewhat shifting the goalposts: Zoe's paragraph with "On Halloween this past year, I was hanging out with a few EAs." It is reasonable to assume the reader will interpret it as hanging out with basically random/typical EAs, and the argument should hold for these people. Your argument would work better if she was hanging out with "EAs working at FTX" or "EAs advising SBF" who could have probably done better than funds on evaluating stuff like how the specific people work.
The EA project is clearly not promised on the idea that it should, for example, "figure out stuff like stock price better than legacy institutions". Quite the contrary - the claim is while humanity actually invests decent amount of competent effort in stock, in comparison, it neglects problems like poverty or xrisk.

The number of burner accounts is too damn high

Jan_Kulveit3y9

In my view this is an example of a mistake in bounded/local consequentialism

From deontic perspective, there is a coordination problem, where "at least consistent handle" posts can be somewhat costly for the poster, but an atmosphere of an earnest discussion of real people has large social benefits. Vice versa, discussion with a large fraction of anonymous accounts - in particular if they are sniping at real people and each other - decreases trust, and is vulnerable to manipulation by sock puppets and nefarious players.

Also, I think there are some virtue ethics costs associated with anonymous posts, roughly in the direction of transparency and integrity.

For example, if I imagine myself anonymously posting something critical received unfavourably by someone, and later, meeting that someone in person, or collaborating on something relevant, I would find it integrity-decreasing to continue hiding the authorship. And if I'd be happy to reveal my identity to the people upset ... why not reveal it directly?

While I don't think these considerations add up to "never post anonymously", I think they are pretty large, and usually much larger than e.g. "small probability of adverse career effects in the EA ecosystem".

Moving community discussion to a separate tab (a test we might run)

Jan_Kulveit3y8

Seems worth trying

At the same time, I don't think the community post / frontpage attention mechanism is the core of what's going on. Which is, in my guess, often best understood as a fight between memeplexes about hearts and minds

[Link] How effective altruists ignored risk

Jan_Kulveit3y12

The quality of reasoning in the text seems somewhat troublesome. Using two paragraphs as example

On Halloween this past year, I was hanging out with a few EAs. Half in jest, someone declared that the best EA Halloween costume would clearly be a crypto-crash — and everyone laughed wholeheartedly. Most of them didn’t know what they were dealing with or what was coming. I often call this epistemic risk: the risk that stems from ignorance and obliviousness, the catastrophe that could have been avoided, the damage that could have been abated, by simply knowing more. Epistemic risks contribute ubiquitously to our lives: We risk missing the bus if we don’t know the time, we risk infecting granny if we don’t know we carry a virus. Epistemic risk is why we fight coordinated disinformation campaigns and is the reason countries spy on each other.

Still, it is a bit ironic for EAs to have chosen ignorance over due diligence. Here are people who (smugly at times) advocated for precaution and preparedness, who made it their obsession to think about tail risks, and who doggedly try to predict the future with mathematical precision. And yet, here they were, sharing a bed with a gambler against whom it was apparently easy to find allegations of shady conduct. The affiliation was a gamble that ended up putting their beloved brand and philosophy at risk of extinction.

It appears that a chunk of Zoe's epistemic risk bears a striking resemblance to financial risk. For instance, if one simply knew more about tomorrow's stock prices, they could sidestep all stock market losses and potentially become stupendously rich.

This highlights the fact that gaining knowledge in certain domains can be difficult task, with big hedge funds splashing billions and hiring some of the brightest minds just to gain a slight edge in simply knowing a bit more about asset prices. It extends to having more info about which companies may go belly up or engage in fraud.

Acquiring more knowledge comes at a cost. Processing knowledge comes at cost. Choosing ignorance is mostly not a result of recklessness or EA institutional design but a practical choice given the resources required to process information. It's actually rational for everyone to ignore most information most of the time (this is standard econ, check rational inattention and extensive literature on the topic).

One real question in this space is if EAs have allocated their attention wisely. The answer seems to be "mostly yes." In case of FTX, heavyweights like Temasek, Sequoia Capital, and SoftBank with billions on the line did their due diligence but still missed what was happening. Expecting EAs to be better evaluators of FTX's health than established hedge funds is somewhat odd. EAs, like everyone else, face the challenge of allocating attention and their expertise lies in "using money for good" rather than "evaluating the health of big financial institutions". For the typical FTX grant recipient to assume they need to be smarter than Sequoia or SoftBank about FTX would likely not be a sound decision.

Jan_Kulveit

Bio

Posts 27

Sequences 1

Comments209

Posts
27

Sequences
1

Comments
209