I'm Jonathan Nankivell, an undergraduate in my last year studying Mathematics. My interests are in ML and collaborative epistemics.
I had to discover EA twice before it stuck. My first random walk was 'psychology -> big five framework -> principle component analysis -> pol.is -> radical exchange -> EA' and my second was 'effect of social media -> should I read the news? -> Ezra Klein on the 80,000 hours podcast -> EA'.
DLT is an urn-based method where treatment assignment is determined by a simulated urn. By removing balls when the treatment fails and by adding balls uniformly so that no type runs out, we can balance the trial allocation in a sensible way.
The actual algorithm, for a two treatment study:
Consider an urn containing three types of balls. Balls of types 1 and 2 represent treatments. Balls of type 0 are termed immigration balls. When a subject arrives, one ball is drawn at random. If a treatment ball of type (1 or 2) is selected, the -th treatment is given to the subject and the response is observed. If it is a failure, the ball is not replaced. If the treatment is a success, the ball is replaced and consequently, the urn composition remains unchanged. If an immigration ball (type 0) is selected, no subject is treated, and the ball is returned to the urn together with two additional treatment balls, one of each treatment type. This procedure is repeated until a treatment ball is drawn and the subject treated accordingly. The function of the immigration ball is to avoid the extinction of a type of treatment ball.
(Source)
Extending DLT to multi-treatment settings is as simple as adding additional ball types.
Simulation studies show that DTL performs very well as a way to maximise statistical power. I've read that this is because it (1) approaches the correct ratio asymptotically and (2) has lower variance than other proposed methods, although I don't have an intuitive understanding of why this is.
I've had a look around and this paper has a nice summary of the method (and proposes how it should handle delayed responses).
Thanks for the pointers. I've reread this now - it is a good read and interesting throughout!
Intriguing though health economics is, I think I am more focused on the actual treatments. I want to understand, when faced with a patient, the methods that can be used to pick the treatment. I am interested in how effective each method would be for the patient and how well they would scale to an entire population.
It seems strange that these questions have been omitted in a book comparing healthcare systems.
I have been considering what I call the fundamental question of healthcare: 'what healthcare system gives the best treatment to the most patients?' This question, simple though it is, seems to lack any established answer.
The importance of this question seems self-evident. Is anyone aware of any research or proposals that seek to address it?
Update: I emailed Alex Tabarrok to get his thoughts on this. He originally proposed using dominant assurance contracts to solve public good problems, and he has experience testing it empirically.
He makes the following points about my suggestion:
He also suggested other ways to solve the same problem:
My current (and weakly held) position is that flipping editorial boards to create new open access journals is the best way to improve publishing standards. Small steps towards a much better world. Would it be possible to for the Future Fund to entice 80% of the big journals to do this? The top journal in every field? Maybe.
Research Coordination Projects
Research that can help us improve
At the root of many problems that are being discussed are coordination problems. People are in prisoners' dilemmas, and keep defecting. This is the case in the suggestion to buy a scientific journal: if the universities coordinated they could buy the journal, remove fees, improve editorial policies, and they would be in a far better situation. Since they don't coordinate, they have to pay to access their own research.
Research into this type of coordination problem has revealed two general strategies for overcoming the prisoners' dilemma type effects: quadratic funding and dominant assurance contracts.
I propose a research project to investigate opportunities to use these techniques, which, if appropriate, would get bankrolled by the future fund.
We could, of course, simply get the future fund to pay for this. There is, however, an alternative that might be worth thinking about.
This seems like the kind of thing that dominant assurance contracts are designed to solve. We could run a Kickstarter, and use the future fund to pay the early backers if we fail to reach the target amount. This should incentivise all those who want the journals bought to chip in.
Here is one way we could do this:
Credence Weighted Citation Metrics
Epistemic Institutions
Citation metrics (total citations, h-index, g-index, etc.) are intended to estimate a researcher's contribution to a field. However, if false claims get cited more then true claims (Serra-Garcia and Gneezy 2021), these citation metrics are clearly not fit for purpose.
I suggest modifying these citation metrics by weighing each paper by the probability that it will replicate. If each paper has citations and probability of replicating , we can modify each formula as follows: instead of measuring total citations we consider credence weighted total citations Instead of using the h-index where we pick 'the largest number such that articles have ', we could use the credence weighted h-index where we pick the largest number such that articles have . We can use this idea to modify citation metrics that evaluate researchers (as above), journals (Impact factor and CiteScore) and universities (rankings).
We can use prediction markets to elicit these probabilities, where the questions are resolved using a combination of large scale replication studies and surrogate scoring. DARPA SCORE is a proof of concept that this can be done on a large scale.
Prioritising credence weighted citation metrics over citation metrics, would improve the incentives researchers have. No longer will they have to compete with people who write 70 flimsy papers a year that no one actually thinks will replicate; now researchers who are right will be rewarded.
Self-Improving Healthcare
Biorisk and Recovery from Catastrophe, Epistemic Institutions, Economic Growth
Our healthcare systems aren't perfect. One underdiscussed part of this is that we learn almost nothing from the vast majority of treatment that happens. I'd love to see systems that learn from the day-to-day process of treating patients, systems that use automatic feedback loops and crowd wisdom to detect and correct mistakes, and that identify, test and incorporate new treatments. It should be possible to do this. Below is my suggestion.
I suggest we allocate treatments to patients in a specific way: the probability that we allocate a treatment to a patient, should match the probability that that treatment is the best treatment for that patient. This will create a RCT of similar patients, which we can use to update the probabilities that we use for allocation. Then repeat. This will maximise the number of patients given the best treatment in the medium to long-term. It does this by detecting and correcting mistakes, and by cautiously testing novel treatments and then, if warranted, rolling them out to the wider population.
This idea is still in it's early stages. More detailed thoughts (such as where the probabilities come from) can be found here. If you have any thoughts or feedback, please get in touch.
Good comment and good points.
I guess the aim of my post was two-fold:
You're definitely right that the caveat is a large one. Adaptive designs are not appropriate everywhere, which is why this post raises points for discussion and doesn't provide a fixed prescription.
To respond to your specific points.
Section three discusses whether adaptive designs lead to
My understanding of the authors' position is that it depends on the trial design. Drop-the-Loser, for example, would perform very well on issues 1 through 4. Other methods, less so. I only omit 5 as CRO are currently ill-equipped to run these studies - there's no fundamental reason for this and if demand increased, this obstacle would reduce. In the mean time, this unfortunately does raise the burden on the investigating team.
This is not an objection I've heard before. I presume the effect of this would be equivalent to the presence of a time trend. Hence some designs would perform well (DTL, DBCD, etc) and others wouldn't (TS, FLGI, etc).
This is often true, although generalised methods built to address this can be found. See here for example.
In summary: While I think that these difficulties can often be overcome, they should not be ignored. Teams should go in eyes open, aware that they may have to do more themselves than typical. Read, discuss, make a plan, implement it. Know each option's drawbacks. Also know their advantages.
Hope that makes sense.