Tyler Johnston

Executive Director @ The Midas Project
1210 karmaJoined Working (0-5 years)Tulsa, OK, USA

Bio

Participation
4

Book a 1:1 with me: https://cal.com/tylerjohnston/book

Share anonymous feedback with me: https://www.admonymous.co/tylerjohnston

Comments
70

Hey yanni,

I just wanted to return to this and say that I think you were directionally correct here and, in light of recent news, recommending jobs at OpenAI in particular was probably a worse mistake than I realized when I wrote my original comment.

Reading the recent discussion about this reminded me of your post, and it's good to see that 80k has updated somewhat. I still don't know quite how to feel about the recommendations they've left up in infosec and safety, but I think I'm coming around to your POV here.

Thank you for writing this criticism! I did give it a read, and I shared some of your concerns around the framing and geopolitical stance that the piece takes.

Regarding the OOM issue, you ask:

Order of magnitude of what? Compute? Effective compute? Capabilities?

I'll excerpt the following from the "count the OOMs" section of the essay:

We can decompose the progress in the four years from GPT-2 to GPT-4 into three categories of scaleups:

  1. Compute: We’re using much bigger computers to train these models. 
  2. Algorithmic efficiencies: There’s a continuous trend of algorithmic progress. Many of these act as “compute multipliers,” and we can put them on a unified scale of growing effective compute. ”
  3. Unhobbling” gains: By default, models learn a lot of amazing raw capabilities, but they are hobbled in all sorts of dumb ways, limiting their practical value. With simple algorithmic improvements like reinforcement learning from human feedback (RLHF), chain-of-thought (CoT), tools, and scaffolding, we can unlock significant latent capabilities.

We can “count the OOMs” of improvement along these axes: that is, trace the scaleup for each in units of effective compute. 3x is 0.5 OOMs; 10x is 1 OOM; 30x is 1.5 OOMs; 100x is 2 OOMs; and so on. We can also look at what we should expect on top of GPT-4, from 2023 to 2027.

It's clear to me what Aschenbrenner is referring to when he says "OOMs" — it's orders of magnitude scaleups in the three things he mentions here. Compute (measured in training FLOP), algorithmic efficiencies (measured by looking at what fraction of training FLOP is needed to achieve comparable capabilities following algorithmic improvements), and unhobbling (as measured, or rather estimated, by what scaleup in training FLOP would have provided equivalent performance improvements to what was provided by the unhobbling). I'll grant you, as does he, that unhobbling is hand-wavy and hard to measure (although that by no means implies it isn't real).

You could still take issue with other questions —as you do — including how strong the relationship is between compute and capabilities, or how well we can measure capabilities in the first place. But we can certainly measure floating point operations! So accusing him of using "OOMs" as a unit, and one that is unmeasurable/detached from reality, surprises me.

Also, speaking of the "compute-capabilities relationship" point, you write:

The general argument seems to be that increasing the first two "OOMs", i.e. increasing compute and improving algorithms, the AI capabilities will also increase. Interestingly, most of the examples given are actually counterexamples to this argument.

This surprised me as well since I took the fact that capabilities have improved with model scaling to be pretty incontrovertible. You give an example:

There are two image generation examples (Sora and GANs). In both examples, the images become clearer and have higher resolution as compute is increased or better algorithms are developed. This is framed as evidence for the claim that capabilities increase as "OOMs" increase. But this is clearly not the case: only the fidelity of these narrow-AI systems increase, not their capabilities.

I think I might see where the divergence between our reactions is. To me, capabilities for an image model means roughly "the capability to generate a clear, high-quality image depicting the prompt." As you admit, that has improved with scale. I think this definition probably best reflects common usage in the field, so I do think it supports his argument. And, I personally think that there are deeper capabilities being unlocked, too — for example, in the case of Sora, the capability of understanding (at least the practical implications of) object permanence and gravity and reflections. But I think others would be more inclined to disagree with that.

Huh, interesting! I guess you could define it this way, but I worry that muddies the definition of "campaign target." In common usage, I think the definition is approximately: what is the institution you are raising awareness about and asking to adopt a specific change? A simple test to determine the campaign target might be "What institution is being named in the campaign materials?" or "What institution has the power to end the campaign by adopting the demands of the campaigners?"

In the case of animal welfare campaigns against foodservice providers, it seems like that's clearly the foodservice companies themselves. Then, in the process of that campaign, one thing you'll do is raise awareness about the issue among that company's customers (e.g. THL's "foodservice provider guide" which raised awareness among public institutions), which isn't all that different from raising awareness among the public in a campaign targeting a B2C company.

I suppose this is just a semantic disagreement, but in practice, it suggests to me that B2B businesses are still vulnerable, in part because they aren't insulated from public opinion—they're just one degree removed from it.

EDIT: Another, much stronger piece of evidence in favor of influence on B2B: Chicken Watch reports 586 commitments secured from food manufacturers and 60 from distributors. Some of those companies are functionally B2C (e.g. manufacturing consumer packaged goods sold under their own brand) but some are clearly B2B (e.g. Perdue Farms' BCC commitment).

Thanks for the comment! I agree with a lot of your thinking here and that there will be many asymmetries.

One random thing that might surprise you: in fact, the sector that animal groups have had the most success with is a B2B one: foodservice providers. For B2B companies, individual customers are fewer in number and much more important in magnitude — so the prospect of convincing, for example, an entire hospital or university to switch their multi-million dollar contract to a competitor with a higher standard for animal welfare is especially threatening. I think the same phenomenon might carry over to the tech industry. However, even in the foodservice provider case, public perception is still one of the main driving factors (i.e., universities and hospitals care about the animal welfare practices of their suppliers in part because they know their students/clients care).

Your advice about outreach to employees and other stakeholders is well-taken too :) Thanks!

Hey! Thanks for the comment - this makes sense. I'm the founder and executive director (that's why I made this post under my name!) and The Midas Project is a nonprofit, which by law entails that details about our funding will be made public in annual filings and such reports will be available upon request, and that our work has to exclusively serve the public interest and not privately benefit anyone associated with the organization (which is generally determined by the IRS and/or independent audits). Hope this assuages some concerns.

It's true we don't have a "team" page or anything like that. FWIW, this is clearly the norm for campaigning/advocacy nonprofits (for example, take a look at the websites for the animal groups I mentioned, or Greenpeace/Sunrise Movement in the climate space) and that precedent is a big part of why I chose the relative level of privacy here — though I'm open to arguments that we should do it differently. I think the most important consideration is protecting the privacy of individual contributors since this work has the potential to make some powerful enemies... or just to draw the ire of e/accs on Twitter. Maybe both! I would be more open to adding an “our leadership” page, which is more common for such orgs - but we’re still building out a leadership team so it seems a bit premature. And, like with funding, leadership details will all be in public filings anyway.

Thanks again for the feedback! It's useful.

Thank you!

You’re right that the main tasks are digital advocacy - but even if you’re not on social media, there are some direct outreach tasks that involve emailing and calling specific stakeholders. We have one task like that live on our action hub now, and will be adding more soon.

Outside of that, we could use all sorts of general volunteer support - anything from campaign recruitment to writing content. Also always eager to hear advice on strategy. Would love to chat more if you’re interested.

Good question! I basically agree with you about the relative importance of foundation model developers here (although I haven’t thought too much about the third point you mentioned. Thanks for bringing it up.)

I should say we are doing some other work to raise awareness about foundation model risks - especially at OpenAI, given recent events - but not at the level of this campaign.

The main constraint was starting (relatively) small. We’d really like to win these campaigns, and we don’t plan to let up until we have. The foundation model developers are generally some of the biggest companies in the world (hence the huge compute, as you mention), and the resources needed to win a campaign likely scale in proportion to the size of the target. We decided it’d be good to keep building our supporter base and reputation before taking the bigger players on. Cognition in particular seems to be in the center of the triple venn diagram between “making high-risk systems,” “way behind the curve on safety issues,” and “small enough that they can’t afford to ignore this.”

Btw, my background is in animal advocacy, and this is somewhat similar to how groups scaled there. i.e. they started by getting local restaurants to stop serving fois gras, and scaled up to getting McDonalds to phase out eggs from battery cages nationwide. Obviously we have less time with this issue - so I would like to scale quickly.

The THL estimate is a little strange, I think — the $2.63 is really just their US branch's total 2022 expenses on cage-free campaigns divided by the current number of hens (presently, or at any given time) in the supply chain of companies they persuaded that year. I'm not sure how they are calculating cage-free campaign spend as a proportion of total budget, nor what "persuaded" means (anyone they did outreach to? anyone they secured new commitments from?). Also, the number doesn't account for the fact that once one hen dies, another takes its place in the same living conditions (although the article acknowledges this limitation). So the real value is the delta, in years, between if/when cage-free would have taken hold by default, and when it did/will thanks to their campaign.

Saulius, the author of the RP report that estimates 12-160 chicken-years impacted per dollar spent, says the following as of 3 months ago:

A new estimate would probably output a similar number because reforms have probably gotten less effective, but I now think that I underestimated cost-effectiveness in this report.

Meanwhile Open Phil says the following, about the same report, but referring to marginal opportunities in particular. It's unclear to me if they're thinking of cage-free campaign spend as a "marginal FAW funding opportunity" however.

We think that the marginal FAW funding opportunity is ~1/5th as cost-effective as the average from Saulius’ analysis.

Thanks for this reply — it does resonate with me. It actually got me thinking back to Paul Bloom's Against Empathy book, and how when I read that I thought something like: "oh yeah empathy really isn't the best guide to acting morally," and whether that view contradicts what I was expressing in my quick take above.

I think I probably should have framed the post more as "longtermism need not be totally cold and utilitarian," and that there's an emotional, caring psychological relationship we can have to hypothetical future people because we can imaginatively put ourselves in their shoes. And that it might even incorporate elements of justice or fairness if we consider them a disenfranchised group without representation in today's decision making who we are potentially throwing under the bus for our own benefit, or something like that. So justice and empathy can easily be folded into longtermist thinking. This sounds like what you are saying here, except maybe I do want to stand by the fact that EA values aren't necessarily trading off against justice, depending on how you define it.

If we go extinct, they won't exist

Yeah, I meant to convey this in my post but framing it a bit differently — that they are real people with valid moral claims who may exist. I suppose framing it this way is just moving the hypothetical condition elsewhere to emphasize that, if they do exist, they would be real people with real moral claims, and that matters. Maybe that's confusing though.

BTW, my personal views lean towards a suffering-focused ethics that isn't seeking to create happy people for their own sake. But I still think that, in coming to that view, I'm concerned with the experience of those hypothetical people in the fuzzy, caring way that utilitarians are charged with disregarding. That's my main point here. But maybe I just get off the crazy train at my unique stop. I wouldn't consider tiling the universe with hedonium to be the ultimate act of care/justice, but I suppose someone could feel that way, and thereby make an argument along the same lines.

Agreed there are other issues with longtermism — just wanted to respond to the "it's not about care or empathy" critique.

Load more