Have you considered contacting the authors of the original QF paper? Glenn and Vitalik seem quite approachable. You could also post the paper on the RxC discord or (if you're willing to go for a high-effort alternative) submit it to their next conference.
Thanks for writing this up!
I think your (largely negative) results on QF under incomplete information should be more widely known. I consider myself to be relatively “plugged” into the online communities that have discussed QF the most (RxC, crypto, etc.) and I only learned about your paper a couple of months ago.
Here are a few more scattered thoughts prompted by the post:
I've been thinking about regranting on and off for about a year, specifically about whether it makes sense to use bespoke mechanisms like quadratic funding or some of its close cousins. I still don't know where I land on many design choices, so I won't say more about that now.
I'm not aware of any retrospective on FTXFF's program but it might be a good idea to do it when we have enough information to evaluate performance (so in 6-12 months?) Another thing in this vein that I think would be valuable and could happen right away is looking into SFF's S-process.
Thanks! That's a reasonable strategy if you can choose question wording. I agree there's no difference mathematically, but I'm not so sure that's true cognitively. Sometimes I've seen asymmetric calibration curves that look fine >50% but tend to overpredict <50%. That suggests it's easier to stay calibrated in the subset of questions you think are more likely to happen than not. This is good news for your strategy! However, note that this is based on a few anecdotal observations, so I'd caution against updating too strongly on it.
Glad you brought up real money markets because the real choice here isn't "5 unpaid superforecasters" vs "200 unpaid average forecasters" but "5 really good people who charge $200/h" vs "200 internet anons that'll do it for peanuts". Once you notice the difference in unit labor costs, the question becomes: for a fixed budget, what's the optimal trade-off between crowd size and skill? I'm really uncertain about that myself and have never seen good data on it.
Great analysis!
I wonder what would happen if you were to do the same exercise with the fixed-year predictions under a 'constant risk' model, i.e. P(t) = 1 - exp(-l*t)
with l = - year / log(1 - P(year))
, to get around the problem that we're still 3 years away from 2026. Given that timelines are systematically longer with a fixed-year framing, I would expect the Brier score of those predictions would be worse. OTOH, the constant risk model doesn't seem very reasonable here, so the results wouldn't have a straightforward interpretation.
This is really cool! As someone who's been doing these calculations in a somewhat haphazard way using a mix of pen and paper, spreadsheets, and Python scripts for years, it's nice to see someone put in the work to create a polished product that others can use.
Something that I've been meaning to incorporate to my estimates and that would be a killer feature for an app like this is a reasonable projection of future earnings, under the assumption that you'll get promoted / switch career paths at the average rate for someone in your current position. Sprinkle a bit of uncertainty on top, and you can get out a nice probability distribution over "time at FI" and "total money donated".
Thanks, Peter!
To your questions: