PC

Paul_Christiano

3561 karmaJoined

Comments
278

ARC returned this money to the FTX bankruptcy estate in November 2023.

I replaced the original comment with "goal-directed," each of them has some baggage and isn't quite right but on balance I think goal-directed is better. I'm not very systematic about this choice, just a reflection of my mood that day.

Quantitatively how large do you think the non-response bias might be? Do you have some experience or evidence in this area that would help estimate the effect size? I don't have much to go on, so I'd definitely welcome pointers.

Let's consider the 40% of people who put a 10% probability on extinction or similarly bad outcomes (which seems like what you are focusing on). Perhaps you are worried about something like: researchers concerned about risk might be 3x more likely to answer the survey than those who aren't concerned about risk, and so in fact only 20% of people assign a 10% probability, not the 40% suggested by the survey.

Changing from 40% to 20% would be a significant revision of the results, but honestly that's probably comparable to other sources of error and I'm not sure you should be trying to make that precise an inference.

But more importantly a 3x selection effect seems implausibly large to me. The survey was presented as being about "progress in AI" and there's not an obvious mechanism for huge selection effects on these questions. I haven't seen literature that would help estimate the effect size, but based on a general sense of correlation sizes in other domains I'd be pretty surprised by getting a 3x or even 2x selection effect based on this kind of indirect association. (A 2x effect on response rate based on views about risks seems to imply a very serious piranha problem)  

The largest demographic selection effects were that some groups (e.g. academia vs industry, junior vs senior authors) were about 1.5x more likely to fill out the survey. Those small selection effects seem more like what I'd expect and are around where I'd set the prior (so: 40% being concerned might really be 30% or 50%).

many AI researchers just don’t seem too concerned about the risks posed by AI, so may not have opened the survey ... the loaded nature of the content of the survey (meaning bias is especially likely),

I think the survey was described as about "progress in AI" (and mostly concerned progress in AI), and this seems like all people saw when deciding to take it. Once people started taking the survey it looks like there was negligible non-response at the question level. You can see the first page of the survey here, which I assume is representative of what people saw when deciding to take the survey.

I'm not sure if this was just a misunderstanding of the way the survey was framed. Or perhaps you think people have seen reporting on the survey in previous years and are aware that the question on risks attracted a lot of public attention, and therefore are much more likely to fill out the survey if they think risk is large? (But I think the mechanism and sign here are kind of unclear.)

specially when you account for the fact that it’s extremely unlikely other large surveys are compensating participants anywhere close to this well

If compensation is a significant part of why participants take the survey, then I think it lowers the scope for selection bias based on views (though increases the chances that e.g. academics or junior employees are more likely to respond).

I can see how other researchers citing these kinds of results (as I have!) may serve a useful rhetorical function, given readers of work that cites this work are unlikely to review the references closely

I think it's dishonest to cite work that you think doesn't provide evidence. That's even more true if you think readers won't review the citations for themselves. In my view the 15% response rate doesn't undermine the bottom line conclusions very seriously, but if your views about non-response mean the survey isn't evidence then I think you definitely shouldn't cite it.

the fact that such a broad group of people were surveyed that it’s hard to imagine they’re all actually “experts” (let alone have relevant expertise),

I think the goal was to survey researchers in machine learning, and so it was sent to researchers who publish in the top venues in machine learning. I don't think "expert" was meant to imply that these respondents had e.g. some kind of particular expertise about risk. In fact the preprint emphasizes that very few of the respondents have thought at length about the long-term impacts of AI.

Given my aforementioned concerns, I wonder whether the cost of this survey can be justified

I think it can easily be justified. This survey covers a set of extremely important questions, where policy decisions have trillions of dollars of value at stake and the views of the community of experts are frequently cited in policy discussions.

You didn't make your concerns about selection bias quantitative, but I'm skeptical about quantitatively how much they decrease the value of information. And even if we think non-response is fatal for some purposes, it doesn't interfere as much with comparisons across questions (e.g. what tasks do people expect to be accomplished sooner or later, what risks do they take more or less seriously) or for observing how the views of the community change with time.

I think there are many ways in which the survey could be improved, and it would be worth spending additional labor to make those improvements. I agree that sending a survey to a smaller group of recipients with larger compensation could be a good way to measure the effects of non-response bias (and might be more respectful of the research community's time).

I am not inclined to update very much on what AI researchers in general think about AI risk on the basis of this survey

I think the main takeaway w.r.t. risk is that typical researchers in ML (like most of the public) have not thought about impacts of AI very seriously but their intuitive reaction is that a range of negative outcomes are plausible. They are particularly concerned about some impacts (like misinformation), particularly unconcerned about others (like loss of meaning), and are more ambivalent about others (like loss of control).

I think this kind of "haven't thought about it" is a much larger complication for interpreting the results of the survey, although I think it's fine as long as you bear it in mind. (I think ML researchers who have thought about the issue in detail tend if anything to be somewhat more concerned than the survey respondents.)

many AI researchers just don’t seem too concerned about the risks posed by AI

My impressions of academic opinion have been broadly consistent with these survey results. I agree there is large variation and that many AI researchers are extremely skeptical about risk.

Yes, I'd bet the effects are even smaller than what this study found. This study gives a small amount of evidence of an effect > 0.05 SD. But without a clear mechanism I think an effect of < 0.05 SD is significantly more likely. One of the main reasons we were expecting an effect here was a prior literature that is now looking pretty bad.

That said, this was definitely some evidence for a positive effect, and the prior literature is still some evidence for a positive effect even if it's not looking good. And the upside is pretty large here since creatine supplementation is cheap. So I think this is good enough grounds for me to be willing to fund a larger study.

My understanding of the results: for the preregistered tasks you measured effects of 1 IQ point (for RAPM) and 2.5 IQ points (for BDS), with a standard error of ~2 IQ points. This gives weak evidence in favor of a small effect, and strong evidence against a large effect.

You weren't able to measure a difference between vegetarians and omnivores. For the exploratory cognitive tasks you found no effect. (I don't know if you'd expect those tests to be sensitive enough to notice such a small effect.)

At this point it seems a bit unlikely to me that there is a clinically significant effect, maybe I'd bet at 4:1 against the effect being >0.05 SD. That said I still think it would be worthwhile for someone to do a larger study that could detect a 0.1 SD effect, since that would be clinically significant and is very weakly suggested by this data (and would make supplementation worthwhile given how cheap it is).

(See also gwern's meta-analysis.)

I think the "alignment difficulty" premise was given higher probability by superforecasters, not lower probability.

Agree that it's easier to talk about (change)/(time) rather than (time)/(change). As you say, (change)/(time) adds better. And agree that % growth rates are terrible for a bunch of reasons once you are talking about rates >50%.

I'd weakly advocate for "doublings per year:" (i) 1 doubling / year is more like a natural unit, that's already a pretty high rate of growth, and it's easier to talk about multiple doublings per year than a fraction of an OOM per year, (ii) there is a word for "doubling" and no word for "increased by an OOM," (iii) I think the arithmetic is easier.

But people might find factors of 10 so much more intuitive than factors of 2 that OOMs/year is better. I suspect this is increasingly true as you are talking more to policy makers and less to people in ML, but might even be true in ML since people are so used to quoting big numbers in scientific notation.

(I'd probably defend my definitional choice for slow takeoff, but that seems like a different topic.)

Yes, I'm not entirely certain Impossible meat is equivalent in taste to animal-based ground beef. However, I do find the evidence I cite in the second paragraph of this section somewhat compelling.

Are you referring to the blind taste test? It seems like that's the only direct evidence on this question.

It doesn't look like the preparations are necessarily analogous. At a minimum the plant burger had 6x more salt. All burgers were served with a "pinch" of salt but it's hard to know what that means, and in any case the plant burger probably ended up at least 2x as salty.[1] You note this as a complicating factor, but salt has a huge impact on taste and it seems to me like it can easily dominate the results of a 2-3 bite taste test between vaguely comparable foods.

I also have no idea at all how good or bad the comparison burger was. Food varies a lot. (It's kind of coincidental the salt happened to show up in the nutrition information---otherwise I wouldn't even be able to make this concrete criticism). It seems really hard to draw conclusions about taste competitiveness of a meat substitute from this kind of n=1 study, beyond saying that you are in the same vague zone.

Have you compared these foods yourself? I eat both of them regularly. Taste competitiveness seemed plausible the first time I ate impossible ground beef, but at this point the difference feels obviously large. I seriously doubt that the typical omnivore would consider them equivalent after eating them a few times.

Overall, despite these caveats on taste, lots of plant-based meat was still sold, so it was "good enough" in some sense, but there was still potentially little resulting displacement of beef (although maybe somewhat more of chicken).

My conclusion would be: plant substitutes are good enough that some people will eat them, but bad enough that some people won't. They are better than some foods and worse than others.

It feels like you are simultaneously arguing that high uptake is a sign that taste is "good enough," and that low uptake is a sign that "good enough" taste isn't sufficient to replace meat. I don't think you can have it both ways, it's not like there is a "good enough" threshold where sales jump up to the same level as if you had competitive taste. Better taste just continuously helps with sales.

I agree and discuss this issue some in the Taste section. In short, this is part of why I think informed taste tests would be more relevant than blind: in naturalistic settings, it is possible that people would report not liking the taste of PBM even though it passes a blind taste test. So I think this accurately reflects what we should expect in practice.

I disagree. Right now I think that plant-based meat substitutes have a reputation as tasting worse than meat largely because they actually taste worse. People also have memories of disliking previous plant-based substitutes they tried. In the past the gap was even larger and there is inertia in both of these.

If you had taste competitive substitutes, then I think their reputation and perception would likely improve over time. That might be wrong, but I don't see any evidence here against the common-sense story.

  1. ^

    The plant burger had about 330mg vs 66mg of salt. If a "pinch" is 200mg then it would end up exactly 2x as salty. But hard to know exactly what a pinch means, and also it matters if you cook salt into the beef or put a pinch on top, and so on.

The linked LW post points out that nuclear power was cheaper in the past than it is today, and that today the cost varies considerably between different jurisdictions. Both of these seem to suggest that costs would be much lower if there was a lower regulatory burden. The post also claims that nuclear safety is extremely high, much higher than we expect in other domains and much higher than would be needed to make nuclear preferable to alternative technologies. So from that post I would be inclined to believe that overregulation is the main reason for a high cost (together with the closely related fact that we've stopped building nuclear plants and so don't benefit from economies of scale).

I can definitely believe the linked post gives a misleading impression. But I think if you want to correct that impression it would be really useful to explain why it's wrong. It would be even better to provide pointers to some evidence or analysis, but just a clear statement of disagreement would already be really helpful.

Do you think that greater adoption of nuclear power would be harmful (e.g. because the safety profile isn't good, because it would crowd out investments in renewables, because it would contribute to nuclear proliferation, or something else)? That lowering regulatory requirements would decrease safety enough that nuclear would become worse than alternative power sources, even if it isn't already? That regulation isn't actually responsible for the majority of costs? A mixture of the above? Something else altogether?

My own sense is that using more nuclear would have been a huge improvement over the actual power mix we've ended up with, and that our failure to build nuclear was mostly a policy decision. I don't fully understand the rationale, but it seems like the outcome was regulation that renders nuclear uncompetitive in the US, and it looks like this was a mistake driven in large part by excessive focus on safety. I don't know much about this so I obviously wouldn't express this opinion with confidence, and it would be great to get a link to a clear explanation of an alternative view.

I'm confused about your analysis of the field experiment. It seems like the three options are {Veggie, Impossible, Steak}. But wouldn't Impossible be a comparison for ground beef, not for steak? Am I misunderstanding something here?

Beyond that, while I think Impossible meat is great, I don't think it's really equivalent on taste. I eat both beef and Impossible meat fairly often (>1x / week for both) and I would describe the taste difference as pretty significant when they are similarly prepared.

If I'm understanding you correctly then 22% of the people previously eating steak burritos switched to Impossible burritos, which seems like a really surprisingly large fraction to me.

(Even further, consumer beliefs are presumably anchored to their past experiences, to word of mouth, etc. and so even if you did have taste equivalence here I wouldn't expect people's decisions to be perfectly informed by that fact. If you produced a taste equivalent meat substitute tomorrow and were able to get 22% of people switching in your first deployment, that would seem like a surprisingly high success rate that's very consistent with even a strong form of PTC, I wouldn't expect consumers to switch immediately even if they will switch eventually. Getting those results with Impossible meat vs steak seems even more encouraging.)

Load more