Jonathan Nankivell

[Link post] FTX and effective governance

· 3y ago · 1m read

The Relative Ethicalness of Clinical Trial Designs

· 1y ago · 1m read

Forum ranking system prototype: Cause Prioritsation Contest posts ranked by prediction markets

· 1y ago · 5m read

Nathan Young

· 3y ago · 2m read

Impact Opportunity: Influence UK Biological Security Strategy

The Relative Ethicalness of Clinical Trial Designs

· 3y ago · 3m read

Comments
10

Jonathan Nankivell1y1

Good comment and good points.

I guess the aim of my post was two-fold:

In all the discussion of the explore-exploit trade-off, I've never heard anyone describe it as a frontier that you can be on or off. The explore-exploit frontier is hopefully a useful framework to add to this dialogue.
The literature on clinical trial design is imo full of great ideas never tried. This is partly due to actual difficulties and partly due to a general lack of awareness about the benefits they offer. I think we need good writing for a generalist audience on this topic and this my attempt.

You're definitely right that the caveat is a large one. Adaptive designs are not appropriate everywhere, which is why this post raises points for discussion and doesn't provide a fixed prescription.

To respond to your specific points.

the paper also points out various other practical issues with adaptive designs in section 3

Section three discusses whether adaptive designs lead to

a substantial chance of allocating more patients to an inferior treatment
reducing statistical power
making statistical inference more challenging
making robust inference difficult if there is potential for time trends
making the trial more challenging to implement in practice.

My understanding of the authors' position is that it depends on the trial design. Drop-the-Loser, for example, would perform very well on issues 1 through 4. Other methods, less so. I only omit 5 as CRO are currently ill-equipped to run these studies - there's no fundamental reason for this and if demand increased, this obstacle would reduce. In the mean time, this unfortunately does raise the burden on the investigating team.

if what standard care entails changes due to changing guidelines

This is not an objection I've heard before. I presume the effect of this would be equivalent to the presence of a time trend. Hence some designs would perform well (DTL, DBCD, etc) and others wouldn't (TS, FLGI, etc).

Adaptive designs generally require knowledge of outcomes to inform randomisation of future enrolees.

This is often true, although generalised methods built to address this can be found. See here for example.

In summary: While I think that these difficulties can often be overcome, they should not be ignored. Teams should go in eyes open, aware that they may have to do more themselves than typical. Read, discuss, make a plan, implement it. Know each option's drawbacks. Also know their advantages.

Hope that makes sense.

The Relative Ethicalness of Clinical Trial Designs

Jonathan Nankivell1y3

DLT is an urn-based method where treatment assignment is determined by a simulated urn. By removing balls when the treatment fails and by adding balls uniformly so that no type runs out, we can balance the trial allocation in a sensible way.

The actual algorithm, for a two treatment study:

Consider an urn containing three types of balls. Balls of types 1 and 2 represent treatments. Balls of type 0 are termed immigration balls. When a subject arrives, one ball is drawn at random. If a treatment ball of type (1 or 2) is selected, the $k$ -th treatment is given to the subject and the response is observed. If it is a failure, the ball is not replaced. If the treatment is a success, the ball is replaced and consequently, the urn composition remains unchanged. If an immigration ball (type 0) is selected, no subject is treated, and the ball is returned to the urn together with two additional treatment balls, one of each treatment type. This procedure is repeated until a treatment ball is drawn and the subject treated accordingly. The function of the immigration ball is to avoid the extinction of a type of treatment ball.

(Source)

Extending DLT to multi-treatment settings is as simple as adding additional ball types.

Simulation studies show that DTL performs very well as a way to maximise statistical power. I've read that this is because it (1) approaches the correct ratio asymptotically and (2) has lower variance than other proposed methods, although I don't have an intuitive understanding of why this is.

I've had a look around and this paper has a nice summary of the method (and proposes how it should handle delayed responses).

Jonathan Nankivell3y1

Regarding the plausibility of biological X-risks:

Is anyone aware of any instance of a disease killing the entirety of a town?

Jonathan Nankivell3y2

Thanks for the pointers. I've reread this now - it is a good read and interesting throughout!

Intriguing though health economics is, I think I am more focused on the actual treatments. I want to understand, when faced with a patient, the methods that can be used to pick the treatment. I am interested in how effective each method would be for the patient and how well they would scale to an entire population.

It seems strange that these questions have been omitted in a book comparing healthcare systems.

Jonathan Nankivell3y1

I have been considering what I call the fundamental question of healthcare: 'what healthcare system gives the best treatment to the most patients?' This question, simple though it is, seems to lack any established answer.

The importance of this question seems self-evident. Is anyone aware of any research or proposals that seek to address it?

Jonathan Nankivell3y3

Update: I emailed Alex Tabarrok to get his thoughts on this. He originally proposed using dominant assurance contracts to solve public good problems, and he has experience testing it empirically.

He makes the following points about my suggestion:

The first step is the most important. Without clarity of what the public good will be and who is expected to pay for it, the DAC won't work
You should probably focus on libraries as the potential source of funding. They are the ones who pay subscription fees, they are the ones who would benefit from this
DACs are a novel forum of social technology. It might be best to try to deliver smaller public goods first, allowing people to get more familiar, before trying to buy a journal

He also suggested other ways to solve the same problem:

Have you considered starting a new journal? This should be cheaper. There would also be a coordination questions to solve to make it prestigious, but this one might be easier
Have you considered 'flipping' a journal? Could you take the editors, reviewers and community that supports an existing journal, and persuade them to start a similar but open access journal? (The Fair Open Access Alliance seem to have had success facilitating this. Perhaps we should support them?)

My current (and weakly held) position is that flipping editorial boards to create new open access journals is the best way to improve publishing standards. Small steps towards a much better world. Would it be possible to for the Future Fund to entice 80% of the big journals to do this? The top journal in every field? Maybe.

Jonathan Nankivell3y5

Research Coordination Projects

Research that can help us improve

At the root of many problems that are being discussed are coordination problems. People are in prisoners' dilemmas, and keep defecting. This is the case in the suggestion to buy a scientific journal: if the universities coordinated they could buy the journal, remove fees, improve editorial policies, and they would be in a far better situation. Since they don't coordinate, they have to pay to access their own research.

Research into this type of coordination problem has revealed two general strategies for overcoming the prisoners' dilemma type effects: quadratic funding and dominant assurance contracts.

I propose a research project to investigate opportunities to use these techniques, which, if appropriate, would get bankrolled by the future fund.

Jonathan Nankivell3y12

We could, of course, simply get the future fund to pay for this. There is, however, an alternative that might be worth thinking about.

This seems like the kind of thing that dominant assurance contracts are designed to solve. We could run a Kickstarter, and use the future fund to pay the early backers if we fail to reach the target amount. This should incentivise all those who want the journals bought to chip in.

Here is one way we could do this:

Use a system like pol.is to identify points of consensus between universities. This should be about the rules going forward if we buy the journal. For example, do they all want pre-registration? What should the copyright situation be? How should peer-review work? How should the journal be ran? etc
Whatever the consensus is, commit to implementing it if the buyout is successful
Start crowdsourcing the funds needed. To maximise the chance of success, this should be done using a DAC (dominant assurance contract). This works like any other crowdfunding mechanism (GoFundMe, Kickstarter, etc), except we have a pool of money that is used to pay the early backers if we fail to meet the goal. If the standard donation size we're asking the unis for is £X, and having the publisher bought is worth at least £X to the uni, then the the dominant strategy for the uni is to chip in.
If we raise the money, great! We can do what we committed to doing. We're happy, the unis are happy, the shareholders of the publisher are happy. If we fail to raise the money, we pay all the early backers, and move on to other things.

Jonathan Nankivell3y4

Credence Weighted Citation Metrics

Epistemic Institutions

Citation metrics (total citations, h-index, g-index, etc.) are intended to estimate a researcher's contribution to a field. However, if false claims get cited more then true claims (Serra-Garcia and Gneezy 2021), these citation metrics are clearly not fit for purpose.

I suggest modifying these citation metrics by weighing each paper by the probability that it will replicate. If each paper has $c_{i}$ citations and probability of replicating $p_{i}$ , we can modify each formula as follows: instead of measuring total citations $T C = \sum all i c_{i},$ we consider credence weighted total citations $C W T C = \sum all i p_{i} c_{i} .$ Instead of using the h-index where we pick 'the largest number $h$ such that $h$ articles have $c_{i} \geq h$ ', we could use the credence weighted h-index where we pick the largest number $h$ such that $h$ articles have $p_{i} c_{i} \geq h$ . We can use this idea to modify citation metrics that evaluate researchers (as above), journals (Impact factor and CiteScore) and universities (rankings).

We can use prediction markets to elicit these probabilities, where the questions are resolved using a combination of large scale replication studies and surrogate scoring. DARPA SCORE is a proof of concept that this can be done on a large scale.

Prioritising credence weighted citation metrics over citation metrics, would improve the incentives researchers have. No longer will they have to compete with people who write 70 flimsy papers a year that no one actually thinks will replicate; now researchers who are right will be rewarded.