Hide table of contents

This is a series of posts for the Bayesian in a hurry. The Bayesian who wants to put probabilities to quantities, but doesn't have the time or inclination to collect data, write code, or even use a calculator.

In these posts I'll give a template for doing probabilistic reasoning in your head. The goal is to provide heuristics which are memorable and easy to calculate, while being approximately correct according to standard statistics. If approximate correctness isn't possible, then I'll aim for intuitively sensible.

An example of when these techniques may come in handy is when you want to quickly generate probabilistic predictions for a forecasting tournament.

The first post in this series covered distributions over positive values with a lower bound, with estimating the lifespan of humanity as a motivating example. In this post, we'll build on that foundation to discuss probabilistic reasoning when you only have a single datapoint.

For those who struggle with the mathematical details, you may skip to the TL;DRs.

Recap: Positive Values with a Lower Bound

First, a recap of the previous post. A process will survive for some unknown time . So far, the process has survived from time  to the present time . With no additional knowledge, we have the following survival function: 

which can be summarised as "the probability of surviving until  is equal to the proportion of  which has already been survived". 

From this distribution, we get the following facts.

  • The median  value is  (Lindy's Law).
  • With 50% confidence the process will survive at least an additional  but at most an additional .
  • With 90% confidence the process will survive at least an additional  but at most an additional .

Positive Values with an Upper Bound

We now consider the case where a process is guaranteed to end by 

If all we know is that a number  falls somewhere between  and , then a sensible prior over possible values of  is a uniform distribution over the range :

with a cumulative function

Note that we get the doomsday argument (and the  distirbution) by applying this treatment to the variable , which has an upper bound of .

Postive Distributions from One Datapoint

The Problem

You see an urn labelled "positive numbers". You draw a ball from an urn. The ball has "1.0" written on it. What can you say about the number which will be written on the next ball you draw from the urn?

There's some sense in which you can't really say anything. The value you drew gives you an estimate of the centre of the distribution of numbers, but you don't know anything about its variance. The values may be distributed from 0.99 to 1.01, or they may be distributed from 1 to 1,000,000. Who knows? We have no idea what kind of scale we're dealing with here.

If you're doing frequentist statistics, then this problem manifests in that you can estimate the mean of the distribution with one datapoint, but the standard deviation is undefined.

If you're doing Bayesian statistics, then you can say something like "given such and such model with such and such prior over parameters, the posterior over parameters is such and such and so the expected distribution is such and such". What that expected distribution looks like will be highly sensitive to the details of "such and such".

But as this post points out:

Things that sometimes weigh 142 grams don't typically also sometimes weigh 12 solar masses. Similarly, things that take 5 minutes don't typically also take 5 days, and things that are 5 cm long aren't typically also 5 km long.

Can we come up with a hand-wave-y justification for some distribution which roughly matches this intuition?

Yes!

The Solution

What is the probability that the second ball has a larger number than the first ball? It's possible that balls with larger numbers are heaver and tend to accumulate at the bottom of the urn. In this case, you'd expect the numbers to increase on average as you draw further balls. But the opposite could just as likely be true. So by symmetry, we'd say there's a equal probability of the next ball having a larger or small number.

What about the next ball having the same number as the first ball? Good question. I'm not sure how to answer that. To keep things simple, I'm going to assume we're dealing real-valued domain where the probability of two samples having exactly the same value is zero. So no integers and no Dirichlet Processes.

So there's a 50/50 chance that the second sample is greater or small than the first sample. Let  be the first sample, and  be the second sample. Then we can consider the cases separately

  • 50% probability that  is an upper bound, in which case the probability density function would be 
  • 50% probability that  is a lower bound, in which case the probability density function would be 

Stitching these two cases together, we get the following density function:

And the following cumulative function:

Which we can plot as follows:

A probability distribution over reals from a single data point X0=0.75. Blue is the density function and green is the cumulative function.

What does this distribution represent exactly? Our discussion so far has focused on the second sample from a some distribution (specifically the second draw from an urn of numbered balls). But there's nothing special about the second sample: the same reasoning applies to any sample. So in fact this distribution describes our "best guess" of the distribution which gave our first sample. In Bayesian terms it is our expected distribution.

We can also think of this as our posterior over the median value of the distribution. By definition, any sample has a 50/50 chance of being greater than or less than the median. It follows that the median has a 50/50 chance of being an upper bound or a lower bound of our first sample. So if we are estimating a point value with a noisy measurement, then this is our probability distribution over the point value given our measurement.

Example: Borogroves

To make this a bit more concrete and intuitive, imagine you come across a borogrove for the first time. This particular borogrove is 2m tall. Let's do a Q&A about borogroves.

Q: Are more borogroves taller or shorter than 2m?

A: I think half of borogroves are taller than 2m and half are shorter.

Q: So the median height of borogrove is 2m?

A: Yes.

Q: What about the mean height?

A: The mean of the right tail of borogrove height is undefined, so the mean height is undefined.

Q: How many borogroves are shorter than 1m?

A: The probability of a borogrove being shorter than 2m is 0.5. Conditional on being shorter than 2m, borogrove heights are evenly distributed, so the probability of being shorter than 1m is 1m/2m=0.5. Multiplying these together gets an overall proportion of 0.25 of borogroves that are shorter than 1m.

Q: How many borogroves are taller than 4m?

A: The probability of a borogrove being taller than 2m is 0.5. Conditional on being taller than 2m, borogrove heights have a survival function , so  (Lindy's Law again). Multiplying these together gets an overall proportion of 0.25 of borogroves that are taller than 4m.

Q: What is the 50% confidence interval for the height of a borogrove?

A: There's a 25% probability of being shorter than 1m and a 25% probability of being taller than 4m, so the 50% confidence interval is 1m to 4m.

Q: What about the 90% confidence interval?

A: If the borogrove is shorter than 2m, then it will be taller than 2m/10=0.2m with 90% confidence. If the borogrove is taller than 2m, then the height with a survival of 10% is 2m/x=0.1, so x=2m/0.1=20m. So our 90% confidence interval is 0.2m to 20m. That is, one-tenth to ten times the value of our first sample.

Q: 99% CI?

A: By similar reasoning: 0.02m to 200m.

TL;DR

If you take a sample  from some distribution, then following samples will

  • have median value .
  • be between  and  with 50% confidence.
  • be between  and  with 90% confidence.
  • be between  and  with 99% confidence.

These facts also apply to the median of the distribution. Thus if you've taken a noisy estimate  of a point value, then the facts prescribe your beliefs about the point value.

The Fundamental Conjecture of Lazy Statistics

We can summarise our discussion so far with what I call The Fundamental Conjecture of Lazy Statistics:

Given any two numbers,  and , you can get a sensible probability value by dividing the smaller number by the bigger number.

Examples:

  • A car has worked for five years. What is the probability that it will still be working after twenty years (including the original five)? Answer: 5/20=25%.
  • The first car for sale I see on some website costs $8k. What is the probability that the second car costs over $800k? Answer: 8/800 = 1%.

Actually the second probability should be divided by two (0.5%) because the second car could go higher or lower than the first car. But if you're dealing with low probabilities then this doesn't make much difference.

The Fundamental Conjecture is not often the best tool to use when making predictions. It is, however, the easiest tool. Use it as a first pass on problems while your brain is booting up.

The hard part is often finding two numbers which you can divide in the first place.

Examples

If you are so inclined, try to solve these problems before looking at the answers.

Example 1: Radius of a planet

The Earth has a radius of about 6,000 km. A new exoplanet is discovered in a nearby star system. What is its radius?

Answer: 600km to 60,000km with 90% confidence, or 60km to 600,000km with 99% confidence.

Example 2: AI Timeline

According to this paper:

Researchers believe there is a 50% chance of AI outperforming humans in all tasks in 45 years and of automating all human jobs in 120 year

When can we be 99% certain that AI will automate all human jobs? What is the probability of all human jobs being automated next year?

Answer: 99% confidence of all jobs automated within  years. Probability of job automation within one year is , which we then divide by two (because 45 years could be an upper or a lower bound), to get  probability.

Next Time...

In the next installment in this series, we'll drop the non-negativity constraint and continue with real-valued distributions from a single data point. 

You may have noticed that this post didn't have any proper Bayesian or frequentist statistics. Hopefully we'll loop back to that next time.

After that we'll start looking at situations where we have multiple data points. Exciting!

Comments2


Sorted by Click to highlight new comments since:

Neat! Small mistake: "What is the probability that it will still be working after eight twenty years" should probably be "after twenty years". And multiple data points are exciting indeed!

Thanks. Fixed.

Curated and popular this week
TL;DR * Screwworm Free Future is a new group seeking support to advance work on eradicating the New World Screwworm in South America. * The New World Screwworm (C. hominivorax - literally "man-eater") causes extreme suffering to hundreds of millions of wild and domestic animals every year. * To date we’ve held private meetings with government officials, experts from the private sector, academics, and animal advocates. We believe that work on the NWS is valuable and we want to continue our research and begin lobbying. * Our analysis suggests we could prevent about 100 animals from experiencing an excruciating death per dollar donated, though this estimate has extreme uncertainty. * The screwworm “wall” in Panama has recently been breached, creating both an urgent need and an opportunity to address this problem. * We are seeking $15,000 to fund a part-time lead and could absorb up to $100,000 to build a full-time team, which would include a team lead and another full-time equivalent (FTE) role * We're also excited to speak to people who have a background in veterinary science/medicine, entomology, gene drives, as well as policy experts in Latin America. - please reach out if you know someone who fits this description!   Cochliomyia hominivorax delenda est Screwworm Free Future is a new group of volunteers who connected through Hive investigating the political and scientific barriers stopping South American governments from eradicating the New World Screwworm. In our shallow investigation, we have identified key bottlenecks, but we now need funding and people to take this investigation further, and begin lobbying. In this post, we will cover the following: * The current status of screwworms * Things that we have learnt in our research * What we want to do next * How you can help by funding or supporting or project   What’s the deal with the New World Screwworm? The New World Screwworm[1] is the leading cause of myiasis in Latin America. Myiasis “
 ·  · 11m read
 · 
Does a food carbon tax increase animal deaths and/or the total time of suffering of cows, pigs, chickens, and fish? Theoretically, this is possible, as a carbon tax could lead consumers to substitute, for example, beef with chicken. However, this is not per se the case, as animal products are not perfect substitutes.  I'm presenting the results of my master's thesis in Environmental Economics, which I re-worked and published on SSRN as a pre-print. My thesis develops a model of animal product substitution after a carbon tax, slaughter tax, and a meat tax. When I calibrate[1] this model for the U.S., there is a decrease in animal deaths and duration of suffering following a carbon tax. This suggests that a carbon tax can reduce animal suffering. Key points * Some animal products are carbon-intensive, like beef, but causes relatively few animal deaths or total time of suffering because the animals are large. Other animal products, like chicken, causes relatively many animal deaths or total time of suffering because the animals are small, but cause relatively low greenhouse gas emissions. * A carbon tax will make some animal products, like beef, much more expensive. As a result, people may buy more chicken. This would increase animal suffering, assuming that farm animals suffer. However, this is not per se the case. It is also possible that the direct negative effect of a carbon tax on chicken consumption is stronger than the indirect (positive) substitution effect from carbon-intensive products to chicken. * I developed a non-linear market model to predict the consumption of different animal products after a tax, based on own-price and cross-price elasticities. * When calibrated for the United States, this model predicts a decrease in the consumption of all animal products considered (beef, chicken, pork, and farmed fish). Therefore, the modelled carbon tax is actually good for animal welfare, assuming that animals live net-negative lives. * A slaughter tax (a
 ·  · 4m read
 · 
As 2024 draws to a close, I’m reflecting on the work and stories that inspired me this year: those from the effective altruism community, those I found out about through EA-related channels, and those otherwise related to EA. I’ve appreciated the celebration of wins and successes over the past few years from @Shakeel Hashim's posts in 2022 and 2023. As @Lizka and @MaxDalton put very well in a post in 2022: > We often have high standards in effective altruism. This seems absolutely right: our work matters, so we must constantly strive to do better. > > But we think that it's really important that the effective altruism community celebrate successes: > > * If we focus too much on failures, we incentivize others/ourselves to minimize the risk of failure, and we will probably be too risk averse. > * We're humans: we're more motivated if we celebrate things that have gone well. Rather than attempting to write a comprehensive review of this year's successes and wins related to EA, I want to share what has personally moved me this year—progress that gave me hope, individual stories and acts of altruism, and work that I found thought-provoking or valuable. I’ve structured the sections below as prompts to invite your own reflection on the year, as I’d love to hear your responses in the comments. We all have different relationships with EA ideas and the community surrounding them, and I find it valuable that we can bring different perspectives and responses to questions like these. What progress in the world did you find exciting? * The launch of the Lead Exposure Elimination Fund this year was exciting to see, and the launch of the Partnership for a Lead-Free Future. The fund jointly committed over $100 million to combat lead exposure, compared to the $15 million in private funding that went toward lead exposure reduction in 2023. It’s encouraging to see lead poisoning receiving attention and funding after being relatively neglected. * The Open Wing Alliance repor