I'm currently researching forecasting and epistemics as part of the Quantified Uncertainty Research Institute.
Kudos for doing this and writing it up!
I've long heard that policy work can be emotionally exhausting, it's good to get a better idea of why that is, more specifically. From what I understand, a lot of people around politics can go decades without having significant wins to show for it, and lots of young people leave pretty quickly.
A large reason to focus on opaque components of larger systems is that difficult-to-handle and existentially risky misalignment concerns are most likely to occur within opaque components rather than emerge from human built software.
Yep, this sounds positive to me. I imagine it's difficult to do this well, but to the extent it can be done, I expect such work to generalize more than a lot of LLM-specific work.
> I don't see any plausible x-risk threat models that emerge directly from AI software written by humans?
I don't feel like that's my disagreement. I'm expecting humans to create either [dangerous system that's basically one black-box LLM] or [something very different that's also dangerous, like a complex composite system]. I expect AIs can also make either system.
A quick estimate is that global GDP divided by the world population would be about $13,000 per person. (Of course, actually dividing up all the money in the world would also break the economy, so it would soon be less.) That’s about the GDP of El Salvador or Sri Lanka. That doesn’t leave much room for funding museums, etc.
For what it's worth, I believe that many people can live pretty great lives for $13k/year. I think that healthy-ish Americans today can do this, with a bit of prep and being willing to live in the right places.
I'd expect that if a magic wand were raised and everyone had their incomes changed to $13k/yr, it would go much better than many people here might imagine. In the US, services would quickly change to cater to less expensive needs. That said, there is a major question of how this would actually work in practice. It's hard to imagine, as this would have profound implications for real estate prices, salaries, living costs, etc.
Happiness is currently correlated with well-being in the US, but (A) it's not clear how much causation there is, in which way, (B) it includes a lot of zero-sum comparisons, and (C), even the low ends aren't too catastrophic.
https://insights.som.yale.edu/insights/as-incomes-rise-variability-in-happiness-shrinks
Related, incomes in the US have gone up a ton in the last hundred years, but happiness levels have moved surprisingly little.
Lastly, I'd flag that in this world, I'd expect there would likely be more culture (i.e. museum-like experiences) generated yearly in total. The extra people with income would offset the losses we get at the top.
I'm not saying that I don't prefer the extra wealth - just that I don't see a need to feel bad about a picture of the world where everyone has $13k/yr. I think this would be much better than what we have now, in total (as those elevated would gain more than those who lose would lose).
"Feeling good" or "Feeling bad" about the current state of such a complex world is a hard thing to be objectively correct or incorrect about.
Also posted here, where it got some good comments: https://www.facebook.com/ozzie.gooen/posts/pfbid037YTCErx7T7BZrkYHDQvfmV3bBAL1mFzUMBv1hstzky8dkGpr17CVYpBVsAyQwvSkl
One of my main frustrations/criticisms with a lot of current technical AI safety work is that I'm not convinced it will generalize to the critical issues we'll have at our first AI catastrophes ($1T+ damage).
From what I can tell, most technical AI safety work is focused on studying previous and current LLMs. Much of this work is very particular to specific problems and limitations these LLMs have.
I'm worried that the future decisive systems won't look like "single LLMs, similar to 2024 LLMs." Partly, I think it's very likely that these systems will be ones made up of combinations of many LLMs and other software. If you have a clever multi-level system, you get a lot of opportunities to fix problems of the specific parts. For example, you can have control systems monitoring LLMs that you don't trust, and you can use redundancy and checking to investigate outputs you're just not sure about. (This isn't to say that these composite systems won't have problems - just that the problems will look different to those of the specific LLMs).
Here's an analogy: Imagine that researchers had 1960s transistors but not computers, and tried to work on cybersecurity, in preparation of future cyber-disasters in the coming decades. They want to be "empirical" about it, so they go along investigating all the failure modes of 1960s transistors. They successfully demonstrate that in extreme environments transistors fail, and also that there are some physical attacks that could be done on the transistor level.
But as we know now, almost all of this has either been solved on the transistor level, or on levels shortly above the transistors that do simple error management. Intentional attacks on the transistor level are possible, but incredibly niche compared to all of the other cybersecurity capabilities.
So just as understanding 1960s transistors really would not get you far towards helping at all with future cybersecurity challenges, it's possible that understanding 2024 LLM details won't help with future 2030 composite AI system disasters.
(John Wentworth and others refer to much of this as the Streetlight effect. I think that specific post is too harsh, but I think I sympathize with the main frustration.)
All that said, here are some reasons to still do the LLM research anyway. Some don't feel great, but might still make it worthwhile.
I'm not saying I could do better. This is one reason why I'm not exactly working in on technical AI safety. I have been interested in strategy in the area (which feels more tractable to me), and have been trying to eye opportunities for technical work, but am still fairly unsure of what's best at this point.
I think the main challenge is that it's just fundamentally hard to prepare for a one-time event with few warning shots (i.e. the main situation we're worried about), several years in the future, in a fast-moving technical space. This felt clearly true 10 years ago, before there were language models that seemed close to TAI. I feel like it's become easier since to overlook this bottleneck, as there's clearly a lot of work we can do with LLMs that naively seems interesting. But that doesn't mean it's no longer true - it might still very much be the case that things are so early that useful safety empirical technical work is very difficult to do.
(Note: I have timelines for TAI that are 5+ years out. If your timelines are shorter, it would make more sense that understanding current LLMs would help.)
Quick point, but I think this title is overstating things. "Is AI Hitting a Wall or Moving Faster Than Ever?" That sounds like it's presupposing that the answer is extreme, where the truth is almost always somewhere in-between.
I've seen a lot of media pieces use this naming convention ("Edward Snowden: Hero or Villain?"), and generally try to recommend against this, going forward.
That's interesting. But I agree that VC is a blessing and a curse. I'm hesitant to rely too much on VC-backed infrastructure, in a similar way that I am on small-independent-project infrastructure. I wish we had better mechanisms for this sort of thing, it could provide a lot of value to have more projects like this have more incentive-compatible ways of making money.
Huh, this is neat!
My "Forum Personality" is "Beloved Online Karma Farmer"? This confuses me a bit - "karma farmer" typically refers to a fairly pejorative role, from what I can tell. Just FYI, this strikes me as you saying, "We noticed you're using semi-questionable methods to technically gain Karma, but it's not as bad as it normally would be." Is this meant to be like a semi-playful dig, without as negative a spin? Was there a system that determined that I gained Karma in over-rated ways? Sorry if I'm newish to some of this terminology.
I spent a few minutes digging into prediction markets to see if sentiment there changed. I couldn't find good questions from 2023 or before that are still open. But here are two that have been open for about a year - and in both cases, it doesn't seem like people have become more cynical over that period.
So it roughly seems like the forecasting community hasn't really updated downwards in 2024, at least.