Thanks! I saw that post. It's an excellent approach. I'm planning to do something similar, but less time-consuming and limited. The range of theories of change that are pursued in AIS is limited and can be broken down into:
Evals can be measured by quality and number of evals, relevance to ex-risks. It seems pretty straightforward to differentiate a bad eval org from a good eval org—engaging with major labs, having a lot of evals, and a relation to existential risks.
Field-building—having a lot of participants who do awesome things after the project.
Research—I argue that the number of citations is also a good proxy for the impact of a paper. It's definitely easy to measure and is related to how much engagement a paper received. In the absence of any work done to bring the paper to the attention of key decision makers, it's very related to the engagement.
I'm not sure how to think about governance.
Take this with a grain of salt.
EDIT: Also I think that engaging broader ML community with AI safety is extremely valuable and citations tells us how if an organization is good at that. Another thing that would be good to reivew is to ask about transparency of organizations, how thier estimate their own impact and so on - this space is really unexplored and this seems crazy to me. The amount of money that goes into AI safety is gigantic and it would be worth exploring what happens with it.
I’m working on a project to estimate the cost-effectiveness of AIS orgs, something like Animal Charity Evaluators does. This involves gathering data on metrics such as:
Some organizations (e.g., MATS, AISC) share impact analyses, there’s no broad comparison. AI safety orgs operate on diverse theories of change, making standardized evaluation tricky—but I think rough estimates could help with prioritization.
I’m looking for:
If you have ideas for useful metrics or feedback on the approach, let me know!
I've always been impressed with Rethink Priorities' work, but this post is underwhelming.
As I understand it, the post argues that we can't treat LLMs as coherent persons. The author seems to think this idea is vaguely connected to the claim that LLMs are not experiencing pain when they say they do. I guess the reasoning goes something like this: If LLMs are not coherent personas, then we shouldn't interpret statements like "I feel pain" as genuine indicators that they actually feel pain, because such statements are more akin to role-playing than honest representations of their internal states.
I think this makes sense but the way it's argued for is not great.
1. The user is not interacting with a single dedicated system.
The argument here seems to be: If the user is not interacting with a single dedicated system, then the system shouldn't be treated as a coherent person.
This is clearly incorrect. Imagine we had the ability to simulate a brain. You could run the same brain simulation across multiple systems. A more hypothetical scenario: you take a group of frozen, identical humans, connect them to a realistic VR simulation, and ensure their experiences are perfectly synchronized. From the user’s perspective, interacting with this setup would feel indistinguishable from interacting with a single coherent person. Furthermore, if the system is subjected to suffering, the suffering would multiply with each instance the experience is replayed. This shows that coherence doesn't necessarily depend on being a "single" system.
2. An LLM model doesn't clearly distinguish the text it generates from the text the user inputs.
Firstly, this claim isn't accurate. If you provide an LLM with the transcript of a conversation, it can often identify which parts are its responses and which parts are user inputs. This is an empirically testable claim. Moreover, statements about how LLMs process text don't necessarily negate the possibility of them being coherent personas. For instance, it’s conceivable that an LLM could function exactly as described and still be a coherent persona.
There is interesting connection between those techniques and "Trapped priors" and the whole take on human cognition as bayesian reasoning and biases as a strong prior. Why would those techniques work? (Assuming they work).
I guess some like "Try to speak truth" can make you consider a wide range of connected notions e.g. you say something like "Climate change is fake" and you start to consider "why would make it true?" Or you just feel (because of your prioir) that this is true and ignore any further considerations (in that case the technique doesn't work).
This is an interesting article! I understand the main claim as follows:
An additional claim is that we typically focus on the "fun" parts of rationality, like self-improvement, instead of the simple but important aspects because they are less enjoyable. For example, discipline and restraint are harder to practice than self-improvement.
I assume this extra claim refers to the rationality community or the EA community.
So, the main point is essentially that rationality is mundane and simple (though not easy!), and we shouldn't try to make it more complex than it really is. This perspective is quite refreshing, and I’ve had some similar thoughts!
However, I’m concerned that, even though people might know about these techniques, the emotionally charged nature of political and moral topics can make it difficult to apply them. It’s not necessarily the other way around. Also, while I’m not sure if you would label these as complex or not, sometimes it takes time to figure out what you actually want in life, and this requires "complex" techniques.
I just want to flag that I've raised the issue of the inconsistencies in the use of discount rate (if by "the discount rate in the GBD data" you mean the 3% or 4% discount rate in the standard inputs table) in an email sent a few days ago to one of the CE employees. Unfortunately, we failed to have a productive discussion, as the conversation died quickly when CE stopped responding. Here is one of the emails I sent:
Hi [name],
I might be wrong but you are using 1.4% rate in the CEA but the value of life saved at various ages is copied from GiveWell standard inputs that uses 4% discount rate to calculate the value. Isn't this an inconsistency?
Mikolaj
I might have been too directive when writing this post. I lack the organizational context and knowledge of how CEAs are used to say definitively that this should be changed. I ultimately agree that this is a small change that might not affect the decisions made, and it's up to you to decide whether to account for it. However, some of the points you raised against updating this are incorrect.
I might have focused too much on the 10% reduction, while the real issue, as Elliot mentioned, is that you ignore two variables in the formula for DALYs averted:
Missing out on three 10% reductions in error X results in a difference of 0.1^3 = 27.1% which could be significant. I generally view organizations as growing through small iterative changes and optimization rather than big leaps.
My critique is only valid if you are trying to measure DALYs averted. If you choose to do something similar to GiveWell, which is more arbitrary, then it might not make sense to adjust for this anymore.
The three changes to the value of life saved come from different frameworks:
EDIT:
I can't say much about the GiveWell 1.5% rate, but I've heard it comes from the Rethink Priorities review, but it suggests 4.3% discount rate: can you direct me somewhere where I can read more about it?
This link is broken