M

Muireall

253 karmaJoined
muireall.space

Comments
26

Seeing some conversations about lack of social graces as a virtue reminded me that I wanted to say a few things in praise of professionalism.

By "professionalism" I mean a certain kind of forthrightness and constancy. The professional has an aura of assurance that judgment falls on the work, not the person. They compartmentalize. They project clearly defined boundaries. They do the sorts of things professionals do.

Professionalism is not a subculture. The professional has no favorites or secret handshakes. Or, at its best, professionalism is the universal secret handshake, even if it's only a handshake offered, as it were.

You will be treated with respect and consideration, but professionalism is not a virtue of compassion, nor even generosity. It might have been a virtue of justice. It is a virtue of curiosity.

I also think it overlaps surprisingly with professionalism as generally conceived.

Very clear piece!

In this framework, the value of our future is equal to the area under this curve and the value of altering our trajectory is equal to the area between the original curve and the altered curve.

You mentioned optimal planning in economics, and I've wondered whether an optimal control framework might be useful for this sort of analysis. I think the difference between optimal control and the trajectory-altering framework you describe is a bit deeper than the different typical domains. There's not just one decision to be made, but a nearly-continuous series of decisions extending through the future (a "policy"). Under uncertainty, the present expected value is the presently realized ("instantaneous") value plus the expectation taken over different futures of the expected value of those futures. Choosing a policy to maximize expected realized value is the control problem.

For the most part, you get the same results with slightly different interpretation. For example, rather than impose some τ, you get an effective lifetime that's mainly determined by "background" risk but also is allowed to vary based on policy. 

One thing that jumped out at me in a toy model is that while the value of reducing existential risk is mathematically the same as an "enhancement" (multiplicative), the time at which we expect to realize that extra value can be very different. In particular, an expected value maximizer may heavily backload the realization of value (even beyond the present expected survival time) if they can neglect present value to expand while reducing existential risk.

I suspect one could learn or make clearer a few other interesting things by following those lines.

That's right. (But lower is better for some other common scoring rules, including the Brier score.)

Social dynamics seem important, but I also think Scott Alexander in "Epistemic Learned Helplessness" put his finger on something important in objecting to the rationalist mission of creating people who would believe something once it had been proven to them. Together with "taking ideas seriously"/decompartmentalization, attempting to follow the rules of rationality itself can be very destabilizing.

Is there a canonical discussion of what you call "race dynamics" somewhere? I can see how proliferating firms and decentralized communities would "mak[e] potential moratoriums on capabilities research much harder (if not impossible) to enforce", but it's less clear to me what that means for how quickly capabilities advance. Is there evidence that, say, the existence of Anthropic has led to increased funding for OpenAI?

In particular, one could make the opposite argument—competition, at least intra-nationally, slows the feedback cycle for advancing capabilities. For example, a lot of progress in information technology seems to have been driven by concentration of R&D into Bell Labs. If the Bell monopoly had been broken up sooner, would that have accelerated progress? If some publicly-funded entity had provided email and internet search services, would Google have reached the same scale?

Meanwhile, training leading-edge models is capital intensive, and competing firms dilute available funding across many projects. Alternative commercial and open-source models drive potential margins down. Diminished prospects for monopoly limit the size and term of bets that investors are willing to make.

I don't know which way the evidence actually falls, but there seems to be a background assumption that competition, race dynamics, and acceleration of progress on capabilities always go hand in hand. I'd be very interested to read more detailed justifications for that assumption.

(Here's my submission—I make some similar points but don't do as much to back them up. The direction is more like "someone should try taking this sort of thing into account"—so I'm glad you did!)

I'd have to think more carefully about the probabilities you came up with and the model for the headline number, but everything else you discuss is pretty consistent with my view. (I also did a PhD in post-silicon computing technology, but unlike Ted I went right into industry R&D afterwards, so I imagine I have a less synoptic view of things like supply chains. I'm a bit more optimistic, apparently—you assign <1% probability to novel computing technologies running global-scale AI by 2043, but I put down a full percent!)

The table "Examples transistor improvements from history (not cherry-picked)" is interesting. I agree that the examples aren't cherry picked, since I had nearly the same list (I decided to leave out lithography and included STI and the CFET on imec's roadmap), but you could choose different prototype dates depending on what you're interested in.

I think you've chosen a fairly relaxed definition for "prototype", which is good for making the point that it's almost certain that the transistors of 2043 will use a technology we already have a good handle on, as far as theoretical performance is concerned.

Another idea would be to follow something like this IRDS table that splits out "early invention" and "focused research". They use what looks like a stricter interpretation of invention—they don't explain further or give references, but I suspect they just have in mind more similarity to the eventual implementation in production. (There are still questions about what counts, e.g., 1987 for tri-gate or 1998 for FinFET?) That gives about 10–12 years from focused research to volume production.

So even if some unforseeable breakthrough is more performant or easily scalable than what we're currently thinking about, it still looks pretty tough to get it out by 2043.

I think FQxI usually gets around 200 submissions for its essay contests where the entire pot is less than the first prize here. I wouldn't be surprised if Open Phil got over 100 submissions.

In fact, algorithmic progress has been found to be similarly as important as compute for explaining progress across a variety of different domains, such as Mixed-Integer Linear Programming, SAT solvers, and chess engines -- an interesting coincidence that can help shed light on the source of algorithmic progress (Koch et al. 2022, Grace 2013). From a theoretical perspective, there appear to be at least three main explanations of where algorithmic progress ultimately comes from:

  1. Theoretical insights, which can be quickly adopted to improve performance.
  2. Insights whose adoption is enabled by scale, which only occurs after there's sufficient hardware progress. This could be because some algorithms don't work well on slower hardware, and only start working well once they're scaled up to a sufficient level, after which they can be widely adopted.
  3. Experimentation in new algorithms. For example, it could be that efficiently testing out all the reasonable choices for new potential algorithms requires a lot of compute.

I always feel kind of uneasy about how the term "algorithmic progress" is used. If you find an algorithm with better asymptotics, then apparent progress depends explicitly on the problem size. MILP seems like a nice benchmark because it's NP-hard in general, but then again most(?) improvements are exploiting structure of special classes of problems. Is that general progress?

One important factor affecting our ability to measure algorithmic progress is the degree to which algorithmic progress on one task generalizes to other tasks. So far, much of our data on algorithmic progress in machine learning has been on ImageNet. However, there seem to be two ways of making algorithms more efficient on ImageNet. The first way is to invent more efficient learning algorithms that apply to general tasks. The second method is to develop task-specific methods that only narrowly produce progress on ImageNet.

We care more about the rate of general algorithmic progress, which in theory will be overestimated by measuring the rate of algorithmic progress on any specific narrow task. This consideration highlights one reason to think that estimates overstate algorithmic progress in a general sense.

I definitely agree with the last sentence, but I'm still not sure how to think about this. I have the impression that, typically, some generalizable method makes a problem feasible, at which point focused attention on applying related methods to that problem drives solving it towards being economical. I suppose for this framework we'd still care more about the generalizable methods, because those trigger the starting gun for each automatable task?

Load more