EDIT: Someone on lesswrong linked a great report by Epoch which tries to answer exactly this.
With the release of openAI o1, I want to ask a question I've been wondering about for a few months.
Like the chinchilla paper, which estimated the optimal ratio of data to compute, are there any similar estimates for the optimal ratio of compute to spend on inference vs training?
In the release they show this chart:
The chart somewhat gets at what I want to know, but doesn't answer it completely. How much additional inference compute would I need a 1e25 o1-like model to perform as well as a one shotted 1e26?
Additionally, for some x number of queries, what is the optimal ratio of compute to spend on training versus inference? How does that change for different values of x?
Are there any public attempts at estimating this stuff? If so, where can I read about it?
Good question, not sure how I get it into my email actually, I can't find it on the website either
edit: I think it's through the forecasting newsletter
I can highly recommend following Sentinel's weekly minutes, a weekly update from superforecasters on the likelihood of any events which plausibly could cause worldwide catastrophe.
Perhaps the weekly newsletter I look the most forward to at this point. Read previous issues here:
Naively, is there a case for using the average of the two?