Summary
I argue that based on its methodology, the DALY metric optimizes for survival of people able to perform tasks similar to those of others. I criticize the disability weight (DW) component of the metric: the DW study sample was unrepresentative, the survey focused on task performance, health state descriptions were prone to researcher bias, and the neutral point was not incorporated. I include three examples of disputable DALY-based prioritization.
This is an entry for the Criticism and Red Teaming Contest. I will most appreciate a discussion on the extent to which the DALY metric should be used.
The writing is inspired by this question.
YLL and YLD sum
The Institute for Health Metrics and Evaluation (IHME) calculates the number of disability adjusted life years that an age and sex group in a country loses in a year due to a cause by the following formula:
(Vos et al., 2020, p. 1431) where
- DALY is the number of disability-adjusted life years lost to that group due to the cause
- YLL the number of years of life lost compared to the standard life expectancy (p. 56)
- YLD is the equivalent of years of life lost due to disability
YLL data compilation
The years of life lost data compilation is relatively straightforward: it uses vital registration (VR) statistics and other source data, if the VR records are incomplete (p. 22). The data is adjusted for mis-diagnoses, noise, outliers, and shocks (pp. 20–48).
YLD calculation
The equivalent of years lost due to disability is the product of the disability weight (DW) and total length of a condition in a population in a year (p. 476). This is also a relatively simple calculation with little to dispute.
Disability weights
Of the DALY components, the disability weight (DW) computation has the most aspects that can be biasing: study sample selection, survey framing, health state description, and survey data analysis.
Unrepresentative study sample selection
The disability weights have been estimated in 2010–2011 and 2013, by 1) an online survey shared in the researchers’ networks and in journals; 2) computer-assisted in-person interviews in Bangladesh, Indonesia, Peru, and Tanzania; 3) telephone interviews in the US; and 4) an online survey in Hungary, Italy, the Netherlands, and Sweden (pp. 472–473).
Although study samples that represent the continent, US, and European region demographics were selected from a national population at random (p. 472), the samples could be unrepresentative in reality: 1) healthcare elites may have biased perceptions about the severity of different conditions, if they experience or treat certain types of conditions at a different rate than the global population experiences them, 2) the national healthcare interpretation can be different from that elsewhere on the continent, due to different prevalence or condition perception, 3) cold call responses in the US can be subject to selection bias, 4) a national demographic with an online access can be unrepresentative of the European region population with and without internet access. Further, although the European surveys validated some and covered additional weights (p. 474), it is otherwise unclear how the data from the four survey types were weighted with respect to each other.
Task-focused survey framing
The disability weights survey asked participants which of two persons in two health states they found healthier. The introduction prompted the interpretation of health as the ability to perform tasks that others do:
A person’s health may limit how well parts of his body or mind work. As a result, some people are not able to do all of the things in life that others may do, and some people are more severely limited than others. I am going to ask you a series of questions about different health problems (p. 473).
Thus, the disability weights may indicate one's ability of pursuing activities similar to those of others in their environment rather than one's subjective perception of their physical and mental health.
Researcher bias-prone health state descriptions
The health state descriptions omit curability, focus on the ability to perform tasks, assume one’s perceptions and ability to adjust, and state others’ attitudes. These factors can lead the respondents to answer in a way that a small group of researchers assumes as accurate.
It is unclear but it seems that respondents were asked only about the sequelae not conditions:
The basis of the GBD disability weight survey assessments are lay descriptions of sequelae highlighting major functional consequences and symptoms (p. 62) … DWs were estimated for additional sequelae that were incorporated into GBD 2013 but had not been included in GBD 2010 (p. 474).
Thus, DWs may omit any differences in health perceptions for conditions with different curability. For example, the disability weight of the health state where one “has a persistent cough and fever, shortness of breath, night sweats, weakness and fatigue and severe weight loss” (GBD 2019, 2020, line 3) is the same whether this is due to a curable or incurable condition.
Some of the sequelae descriptions focus on one’s ability to perform tasks. For example, mild acute infection patient “has a low fever and mild discomfort, but no difficulty with daily activities” (line 31), and moderate acute infection patient “has a fever and aches, and feels weak, which causes some difficulty with daily activities” (line 33). This can motivate respondents to evaluate the relative ability of performing tasks; the responses could focus on subjective feelings to a greater extent if the latter parts were omitted: ‘has a low fever and mild discomfort’ and ‘has a fever and aches, and feels weak.’
One’s subjective perceptions are assumed. For instance, it is given that “[e]arly HIV without anemia” (line 17) and “[m]ild sickle cell/beta-thalassemia, without anemia” (line 2048) “causes some worry.” However, various patients can perceive these conditions differently. Others’ attitudes are also assumed. For example, it is stated that squamous cell carcinoma (skin cancer) “causes others to stare and comment” (line 754). However, it is possible that some groups do not stare or comment. Further, one’s ability to adjust is estimated by experts rather than the respondents. For example, a completely blind trachoma patient is assumed to have “great difficulty going outside the home without assistance” (line 209).
Unclear survey data analysis
A probit regression was used to arrange health states on a scale from 0 to 1, where states which were more commonly considered as healthier of the pair were closer to 0 and vice versa (p. 474). The methodology is not described at length but looking at a probit graph (“Probit,” 2022), the disability weights could increase sharply in the beginning and at the end of the perceived severity scale, while the increase could become more gradual in the middle. This sharp increase in the beginning of the severity scale and moderate increase in the middle of that scale can be observed for vision impairment (moderate, severe, and blindness), with DWs of 0.031, 0.184, and 0.187 (lines 207–209). However, the schistosomiasis sequelae do not follow the sharp and subsequent gradual increase pattern (lines 175–185). Thus, it is unclear whether the DW value increases with increased severity linearly.
No neutral point
It was assumed that death is the worst health state, of DW one (p. 473). The neutral point, “where someone is neither satisfied nor dissatisfied” (Key Ideas, n.d.) was not included in the health state comparison surveys. One could argue that some health states, such as severe episode of major depressive disorder (DW=0.657) (line 1193), can be subjectively perceived as negative.
Disputable prioritization results
I hypothesize that some disability weights can bias global health spending prioritization. For example, using the DALY metric, ceteris paribus,
- If 100 blindness (DW=0.187) or 102 (>0.187/0.184) severe vision impairment (DW=0.184) cases could be cured at the same price, the latter should be prioritized.
- If preventing one mild motor and cognitive impairment (DW=0.031) (line 404) costs 11 times as much as preventing mild distance vision impairment (DW=0.003) (line 521), then the latter preterm birth complications program should be prioritized.
- If the prevention of severe tooth loss (DW=0.067) (line 2117) and the combination of mild chronic abdominal pain (DW=0.011) (line 1919), mild chronic respiratory problems (DW=0.019) (line 1993), primary infertility (DW=0.008) (line 1888), syndactyly (DW=0.011) (line 1872), borderline intellectual disability (DW=0.011) (line 1825), and mild oral disorders (DW=0.006) (line 2119) cost the same, then it is unclear which program to choose.
Conclusion
Based on its calculation methodology, the DALY metric prioritizes programs that maximize the standard survival of persons who can perform tasks similar to those that others do. The metric indicates subjective healthiness perception to a limited extent and does not reflect wellbeing.
References
Global Burden of Disease Study 2019 (GBD 2019) Disability Weights. (2020). [Data set]. Institute for Health Metrics and Evaluation (IHME). https://doi.org/10.6069/1W19-VX76
Key Ideas. (n.d.). Happier Lives Institute. Retrieved July 8, 2022, from https://www.happierlivesinstitute.org/key-ideas/
Probit. (2022). In Wikipedia. https://en.wikipedia.org/w/index.php?title=Probit&oldid=1082006366
Vos, T., Lim, S. S., Abbafati, C., Abbas, K. M., Abbasi, M., Abbasifard, M., Abbasi-Kangevari, M., Abbastabar, H., Abd-Allah, F., Abdelalim, A., Abdollahi, M., Abdollahpour, I., Abolhassani, H., Aboyans, V., Abrams, E. M., Abreu, L. G., Abrigo, M. R. M., Abu-Raddad, L. J., Abushouk, A. I., … Murray, C. J. L. (2020). Supplementary appendix 1 to Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: A systematic analysis for the Global Burden of Disease Study 2019. The Lancet, 396(10258), 1204–1222. https://doi.org/10.1016/S0140-6736(20)30925-9