David Johnston

I can see how this gets you for each each item $i$ , but not $P (({value}_{i})_{i \in items} | comparisons)$ . One of the advantages Ozzie raises is the possibility to keep track of correlations in value estimates, which requires more than the marginal expectations.

Relative Value Functions: A Flexible New Format for Value Estimation

David Johnston3y1

So constructing a value ratio table means estimating a joint distribution of values from a subset of pairwise comparisons, then sampling from the distribution to fill out the table?

In that case, I think estimating the distribution is the hard part. Your example is straightforward because it features independent estimates, or simple functional relationships.

Relative Value Functions: A Flexible New Format for Value Estimation

David Johnston3y1

The only piece of literature I had in mind was von Neumann and Morgenstern’s representation theorem. It says: if you have a set of probability distributions over a set of outcomes and for each pair of distributions you have a preference (one is better than the other, or they are equal) and if this relation satisfies the additional requirements of transitivity, continuity and independence from alternatives, then you can represent the preferences with a utility function unique up to affine transformation.

Given that this is a foundational result for expected utility theory, I don’t think it is unusual to think of a utility function as a representation of a preference relation.

Do you envision your value ratio table to be underwritten by a unique utility function? That is, could we assign a single number to every outcome $x$ such that the table cell corresponding to three outcomes pair $(x, y)$ is always equal to $V (x) / V (y)$ ? These utilities could be treated as noisy estimates, which allows for correlations between $V (x)$ and $V (y)$ for some pairs.

My remarks concern what a value ratio table might be if it is more than just a “visualisation” of a utility function.

Existential risk pessimism and the time of perils

David Johnston3y3

Because we are more likely to see no big changes than to see another big change.

if the risk is usually quite low (e.g. 0.001 % per century), but sometimes jumps to a high value (e.g. 1 % per century), the cumulative risk (over all time) may still be significantly below 100 % (e.g. 90 %) if the magnitude of the jumps decreases quickly, and risk does not stay high for long.

I would call this model “transient deviation” rather than “random walk” or “regular oscillation”

Existential risk pessimism and the time of perils

David Johnston3y3

We can still get H4 if the amplitude of the oscillation or random walk decreases over time, right?

The average needs to fall, not the amplitude. If we're looking at risk in percentage points (rather than, say, logits, which might be a better parametrisation), small average implies small amplitude, but small amplitude does not imply small average.

Only if the sudden change has a sufficiently large magnitude, right?

The large magnitude is an observation - we have seen risk go from quite low to quite high over a short period of time. If we expect such large magnitude changes to be rare, then we might expect the present conditions to persist.

Relative Value Functions: A Flexible New Format for Value Estimation

David Johnston3y1

FWIW I think the general kind of model underlying what I’ve written is a joint distribution that models value something like

Relative Value Functions: A Flexible New Format for Value Estimation

David Johnston3y3

Thought about this some more. This isn't a summary of your work, it's an attempt to understand it in my terms. Here's how I see it right now: we can use pairwise comparisons of outcomes to elicit preferences, and people often do, but they typically choose to insist that each outcome has a value representable as a single number and use the pairwise comparisons to decide which number to assign each outcome. Insisting that each outcome has a value is a constraint on preferences that can allow us to compute which outcome is preferred between two outcomes for which we do not have direct data.

I see this post as arguing that we should instead represent preferences as a table of value ratios. This is not about eliciting preferences, but representing them. Why would we want to represent them like this? At first glance:

If the important thing is we represent preferences as a table, then we can capture every important comparison with a table of binary preferences
If we want to impose additional constraints so that we can extrapolate preferences, preference ratios seems to push us back to assigning one or more values to every outcome

What makes value ratios different from other schemes with multiple valuation functions is that value ratios give us a value function for each outcome we investigate. That is, there is a one-to-one correspondence between outcomes and value functions.

Here is a theory of why that might be useful: When we talk about the value of outcomes (such as "$5"), we are actually talking about that outcome in some context (such as "$5 for me now" or "$5 for someone who is very poor, now"). Preference relations can and do treat these outcomes as different depending on the context - $5 for me is worth less than $5 for someone who is very poor. Because of this, a value scale based on "$5-equivalents" will be different depending on the context of the $5.

A key proposition to motivate value ratios, Proposition 1: every outcome which we consider comes with a unique implied mixture of contexts. That is, if I say "the value of $5", I mean where $P_{$ 5}$ is the mixture of contexts implied by my having said "$5".

This means, if I want to compare "the value of $10m" to "the value of saving a child's life", I have two options: I can compare $\sum_{c} P_{$ 10 m} (c) V ($ 10 m | c)$ to $\sum_{c} P_{$ 10 m} (c) V (s a v e l i f e | c)$ or I can compare $\sum_{c} P_{s a v e l i f e} (c) V ($ 10 m | c)$ to $\sum_{c} P_{s a v e l i f e} (c) V (s a v e l i f e | c)$ . These might give me different answers, and the correct comparison depends which applied context I am considering these options in.

A value ratio could therefore be considered a table where each column is a context and each row specifies the relative value of the given item in that context. Note that, under this interpretation, we should not expect $x_{i j} = \frac{1}{x_{j i}}$ , unless $i = j$ . This is because items have different values in different contexts.

This can be extended to distributions over value ratios, in which case perhaps each sample comes with a context sampled from the distribution of contexts for that column of the table (I'm not entirely sure that works, but maybe it does). This can allow us to represent within-column correlations if we know that one outcome is $x$ times better than another, regardless of context.

I don't think proposition 1 is plausible if we interpret it strictly. I'm pretty sure at different times people talk about the value of $5 with different implied contexts, and at other times I think people probably make some effort to consider the value of quite different outcomes in a common context. However, I think there still might be something to it. Whenever you're weighing up different outcomes, you definitely have an implicit context in mind. Furthermore, there probably is a substantial correlation between the context and the outcome - if two different people are considering the value of saving a child's life then there probably is substantial overlap between the contexts they're considering. Moreover, it's plausible that context sensitivity is an issue for the kinds of value comparisons that EAs want to make.

Relative Value Functions: A Flexible New Format for Value Estimation

David Johnston3y3

I don't think it's all you are doing, that's why I wrote the rest of my comment (sorry to be flippant).

The point of bringing up binary comparisons is that a table of binary comparisons is a more general representation than a single utility function.

Relative Value Functions: A Flexible New Format for Value Estimation

David Johnston3y1

If all we are doing is binary comparisons between a set of items, it seems to me that it would be sufficient to represent relative values as a binary - i.e., is item1 better, or item2? Or perhaps you want a ternary function - you could also say they're equal.

Using a ratio instead of a binary indicator for relative values suggests that you want to use the function to extrapolate. I'm not sure that this approach helps much with that, though. For example,

costOfp001DeathChance = ss(10 to 10k) // Cost of a 0.001% chance of death, in dollars
chanceOfDeath001 = ss(-1 * costOfp001DeathChance * dollar1) // Cost of a 0.001% chance of death

does not tell me how many $ a 0.01% chance of death is worth; rather, it tells me how many times better it is than $1. Without a function f(outcome in $)->value, this doesn't enable a comparison to any other amount of dollars. We can, of course, add such a function to our estimation, but if we do then I think the function is doing much more than the value ratios to enable us to extrapolate our value judgements. Unless we have f(outcome2)=f(outcome1)*outcome2/outcome1, I don’t see how we can use ratios at all, but if we do have it then we’re back to single values.

The alternative approach seems to me to be to treat it as a machine learning problem - given binary value judgements, build a binary classifier that tells you whether item1 or item2 is better. I expect that if we had value ratios instead of binary comparisons we might do a bit better here, but they might also be harder to elicit.

Talking publicly about AI risk

David Johnston3y3

AFAIK the official MIRI solution to AI risk is to win the race to AGI but do it aligned.

Part of the MIRI theory is that winning the AGI race will give you the power to stop anyone else from building AGI. If you believe that, then it’s easy to believe that there is a race, and that you sure don’t want to lose.

David Johnston

Posts 10

Comments117

Posts
10

Comments
117