...if they had explained why their views were not moved by the expert reviews OpenPhil has already solicited.
In "AI Timelines: Where the Arguments, and the 'Experts,' Stand," Karnofsky writes:
Then, we commissioned external expert reviews.7
Speaking only for my own views, the "most important century" hypothesis seems to have survived all of this. Indeed, having examined the many angles and gotten more into the details, I believe it more strongly than before.
The footnote text reads, in part:
Reviews of Bio Anchors are here; reviews of Explosive Growth are here; reviews of Semi-informative Priors are here.
Many of these reviewers disagree strongly with the reports under review.
Davidson 2021 on semi-informative priors received three reviews.
By my judgment, all three made strong negative assessments, in the sense (among others) that if one agreed with the review, one would not use the report's reasoning to inform decision-making in the manner advocated by Karnofsky (and by Beckstead).
From Hajek and Strasser's review:
His final probability of 7.3% is a nice summary of his conclusion, but its precision (including a decimal place!) belies the vagueness of the question, the imprecise and vague inputs, and the arbitrary/subjective choices Tom needs to make along the way—we discuss this more in our answers to question 8. We think a wider range is appropriate given the judgment calls involved. Or one might insist that an imprecise probability assignment is required here. Note that this is not the same as a range of permissible sharp probabilities. Following e.g. Joyce, one might think that no precise probability is permissible, given the nature of the evidence and the target proposition to which we are assigning a credence.
From Hanson's review:
I fear that for this application, this framework abstracts too much from important details.
For example, if the actual distribution is some generic lump, but the model distribution is an exponential falling from an initial start, then the errors that result from this difference are probably worse regarding the lowest percentiles of either distribution, where the differences are most stark. So I’m more comfortable using such a simple model to estimate distribution medians, relative to low percentiles. Alas, the main products of this analysis are exactly these problematic low percentile estimates.
From Halpern's review:
If our goal were a single estimate, then this is probably as reasonable as any other. I have problems with the goal (see below). [...]
As I said above, I have serious concerns about the way that dynamic issues are being handled. [...]
I am not comfortable with modeling uncertainty in this case using a single probability measure.
Davidson 2021 on explosive growth received many reviews; I'll focus on the five reviewers who read the final version.
Two of the reviewers found little to disagree with. These were Leopold Aschenbrenner (a Future Fund researcher) and Ege Erdil (a Metaculus forecaster).
The other three reviewers were academic economists specializing in growth and/or automation. Two of them made strong negative assessments.
From Ben Jones' review:
Nonetheless, while this report suggests that a rapid growth acceleration is substantially less likely than singularity-oriented commentators sometimes advocate, to my mind this report still sees 30% growth by 2100 as substantially likelier than my intuitions would suggest. Without picking numbers, and acknowledging that my views may prove wrong, I will just say that achieving 30% growth strikes me as very unlikely. Here I will articulate some reasons why, to provoke further discussion.
From Dietrich Vollrath's review:
All that said, I think the probability of explosive growth in GWP is very low. Like 0% low. I think those issues I raised above regarding output and demand will bind and bite very hard if productivity grows that fast.
The third economist, Paul Gaggl, agreed with the report about the possibility of high GWP growth but raised doubts as to how long it could be sustained. (How much this matters depends on what question we're asking; "a few decades" of 30% GWP growth is not a permanent new paradigm, but it is certainly a big "transformation.")
Reviews of Cotra (2020) on Biological Anchors were mostly less critical than the above.
I expect that some experts would be much more likely to spend time and effort on the contest if
These considerations seems especially relevant for the "dark matter" experts hypothesized in this post and Karnofsky's, who "find the whole thing so silly that they're not bothering to engage." These people are unusually likely to have a low opinion of the Future Fund's overall epistemics (point 1), and they are also likely to disagree with the Fund's reasoning along a relatively large number of axes, so that locating a crux becomes more of a problem (point 2).
Finally: I, personally would be more likely to submit to the contest if I had a clearer sense where the cruxes were, and why past criticisms have failed to stick. (For clarity, I don't consider myself an "expert" in any relevant sense.)
While I don't "find the whole thing so silly I don't bother to engage," I have relatively strong methodological objections to some of the OpenPhil reports cited here. There is a large inferential gap between me and anyone who finds these reports prima facie convincing. Given the knowledge that someone does find them prima facie convincing, and little else, it's hard to know where to begin in trying to close that gap.
Even if I had better guidance, the size of the gap increases the effort required and decreases my expected probability of success, and so it makes me less likely to contribute. This dynamic seems like a source of potential bias in the distribution of the responses, though I don't have any great ideas for what to do about it.
I completely agree.
I've worked in ML engineering and research for over 5 years at two companies, I have a PhD (though not in ML), and I've interviewed many candidates for ML engineering roles.
If I'm reviewing a resume and I see someone has just graduated from a PhD program (and does not have other job experience), my first thoughts are
I've never interviewed a candidate with 4 years at OpenAI on their resume, but if I had, my very first thoughts would involve things like
I dunno, I might be overrating OpenAI here?
But I think the comment in the post at least requires some elaboration, beyond just saying "many places have a PhD requirement." That's an easy way to filter candidates, but it doesn't mean people in the field literally think that PhD work is fundamentally superior to (and non-fungible with) all other forms of job experience.