David Seiler

66 karmaJoined Nov 2022

Comments
6

Effective Aspersions: How the Nonlinear Investigation Went Wrong

Surely Kat didn't expect Ben would receive her evidence and do nothing with it?

Why not? According to Nonlinear, they had already told Ben they had evidence, and he'd decided to publish anyway: "He insists on going ahead and publishing this with false information intact, and is refusing to give us time to provide receipts/time stamps/text messages and other evidence". Ben already wasn't doing what Nonlinear wanted; the idea that he might continue shouldn't have been beyond their imagination. Since that's unlikely, it follows that Lightcone shouldn't have believed it, and should instead have expected that Nonlinear's threat was meant the way it was written.

More broadly, I think for any kind of claim of the form "your interpretation of what I said was clearly wrong and maybe bad faith, it should have been obvious what I really meant", any kind of thoughtful response is going to look pedantic, because it's going to involve parsing through what specifically was said, what they knew when they said it, and what their audience knew when they heard it. In this kind of discussion I think your pedantry threshold has to be set much higher than usual, or you won't be able to make progress.

Effective Aspersions: How the Nonlinear Investigation Went Wrong

David Seiler1y14

I had all that context when I read it, and the reading you're giving here still didn't occur to me. To me it says, unambigiously, two contradictory things. When I read something like that I try to find a perspective where the two things don't actually conflict. What I landed on here was "they won't sue Ben so long as he removes the parts they consider false and libelous, even if what's left is still pretty harsh". "Nonlinear won't sue so long as Ben reads the evidence, no matter what he does with it" isn't quite ruled out by the text, but leaves a lot of it unexplained: there's a lot of focus on publishing false information in that email, much more than just that one line. It doesn't really seem to make logical sense either: if some of Ben's post is libelous, why would his looking at contradictory evidence and deciding not to rewrite anything make it better?

Anyway, that's my thought process on it; if I'd got that email -- again, knowing nothing about you folks except what you wrote in the rebuttal post, and I guess that one subthread about nondisparagement agreements from the original -- then I'd certainly have taken it as a threat, contingent on publishing without changes. I hope it helps illustrate how Lightcone could take it the same way.

(searching for "libel" in the original thread also gives me this comment, making the same point three months ago, so I guess I am not adding anything new to the discourse after all. Oh well, there are probably some other people who read this thread but not that one).

Effective Aspersions: How the Nonlinear Investigation Went Wrong

David Seiler1y15

He then went around saying we'd threatened to sue if he published (link to one comment of many where they do this), when we couldn't have made it more clear that that wasn't the case.
Excerpt from email we sent to Ben. Bolding in the original.

But then, from the next paragraph of that same email:

...if published as is we intend to pursue legal action for libel against Ben Pace personally and Lightcone for the maximum damages permitted by law.

It seems to me that you and Emerson are trying to have it two ways. On the one hand, the email clearly says that you only wanted time. On the other hand, the email also clearly says that if Ben gave you that time and then didn't respond the way you wanted, you were still going to sue him. "we'd threatened to sue if he published" is a much more accurate summary of that email than "We said we would sue if they didn't give us time to share the evidence with them. " IMO.

(note, I haven't read Ben's original piece, just your rebuttal)

Quick takes on "AI is easy to control"

David Seiler1y13

...it still helps move the conversation forward by clarifying where the debate is at.

Anything Nate writes would do that, because he's one of the debaters, right? He could have written "It's a stupid post and I'm not going to read it", literally just that one sentence, and it would still tell us something surprising about the debate. In some ways that post would be better than the one we got: it's shorter, and much clearer about how much work he put in. But I would still downvote it, and I imagine you would too. Even allowing for the value of the debate itself, the bar is higher than that.

For me, that bar is at least as high as "read the whole article before replying to it". If you don't have time to read an article that's totally fine, but then you don't have time to post about it either.

The Happier Lives Institute is funding constrained and needs you!

David Seiler1y19

...helpful, concrete suggestions have been relatively sparse on this post as a whole.

I don't really share this sense (I think that even most of Gregory Lewis' posts in this thread have had concretely useful advice for HLI, e.g. this one), but let's suppose for the moment that it's true. Should we care?

In the last round of posts, four to six months ago, HLI got plenty of concrete and helpful suggestions. A lot of them were unpleasant, stuff like "you should withdraw your cost-effectiveness analysis" and "here are ~10 easy-to-catch problems with the stats you published", but highly specific and actionable. What came of that? What improvements has HLI made? As far as I can tell, almost nothing has changed, and they're still fundraising off of the same flawed analyses. There wasn't even any movement on this unambiguous blunder until you called it out. It seems to me that giving helpful, concrete suggestions to HLI has been tried, and shown to be low impact.

One thing people can do in a thread like this one is talk to HLI, to praise them, ask them questions, or try to get them to do things differently. But another thing they can do is talk to each other, to try and figure out whether they should donate to HLI or not. For that, criticism of HLI is valuable, even if it's not directed to HLI. This, too, counts as "figuring out a path forward".

Could someone help me understand why it's so difficult to solve the alignment problem?

Answer by David SeilerJul 23, 202310

Why is [robustly pointing at goals] so difficult? Is there an argument that it is impossible?

Well, I hope it's not impossible! If it is, we're in a pretty bad spot. But it's definitely true that we don't know how to do it, despite lots of hard work over the last 30+ years. To really get why this should be, you have to understand how AI training works in a somewhat low-level way.

Suppose we want an image classifier -- something that'll tell us whether a picture has a sheep in it, let's say. Schematically, here's how we build one:

Start with a list of a few million random numbers. These are our parameters.
Find a bunch of images, some with sheep in them and some without (we know which is which because humans labeled them manually). This is our training data.
Pick some images from the training data, multiply them with the parameters in various ways, and interpret the result as a confidence, between 0 and 1, of whether each image has a sheep.
Probably it did terribly! Random numbers don't know anything about sheep.
So, we make some small random-ish changes to the parameters and see if that helps.
1. For example, we might say "we changed this parameter from 0.5 to 0.6 and overall accuracy went from 51.2% to 51.22%, next time we'll go to 0.7 and see if that keeps helping or if it's too high or what."
Repeat step 5, a lot.
Eventually you get to where the parameters do a good job predicting whether the images have a sheep or not, so you stop.

I'm leaving out some mathematical details, but nothing that changes the overall picture.

All modern AI training works basically this way: start with random numbers, use them to do some task, evaluate their performance, and then tweak the numbers in a way that seems to point toward better performance.

Crucially, we never know why a change to particular parameter is good, just that it is. Similarly, we never know what the AI is "really" trying to do, just that whatever it's doing helps it do the task -- to classify the images in our training set, for example. But that doesn't mean that it's doing what we want. For example, maybe all the pictures of sheep are in big grassy fields, while the non-sheep pictures tend to have more trees, and so what we actually trained was an "are there a lot of trees?" classifier. This kind of thing happens all the time in machine learning applications. When people talk about "generalizing out of distribution", this is what they mean: the AI was trained on some data, but will it still perform the way we'd want on other, different data? Often the answer is no.

So that's the first big difficulty with setting terminal goals: we can't define the AI's goals directly, we just show it a bunch of examples of the thing we want and hope it learns what they all have in common. Even after we're done, we have no way to find out what patterns it really found except by experiment, which with superhuman AIs is very dangerous. There are other difficulties but this post is already rather long.

David Seiler

Comments6

Comments
6