Review – Statistics in UrologyReporting and Interpreting Decision Curve Analysis: A Guide for Investigators
Introduction
Clinical risk prediction models are commonly developed in urology and other medical fields to predict the probability or risk of a current disease (eg, biopsy-detectable aggressive prostate cancer), or a future state (eg, cancer recurrence) [1], [2], [3]. Such models are usually evaluated with statistical measures for discrimination and calibration. Discrimination evaluates how well the predicted risks distinguish between patients with and without disease. The c-statistic is the most commonly used measure for discrimination. Calibration evaluates the reliability of the estimated risks: if we predict 10%, on average 10 out of 100 patients should have the disease [1], [4]. Assessments of calibration may include graphs and statistics such as observed versus expected ratios or calibration slopes. Although a model with better discrimination and calibration should theoretically be a better guide to clinical management [4], [5], [6], statistical measures fall short when we want to evaluate whether the risk model improves clinical decision making. Such measures cannot inform us whether it is beneficial to use a model to make clinical decisions or which of two models leads to better decisions, especially if one model has better discrimination and the other better calibration [7].
To overcome this limitation, decision-analytic measures have been developed to summarize the performance of the model in supporting decision making. We focus on net benefit (NB) as the key part of decision curve analysis (DCA), which was introduced in 2006 [8]. Editorials supporting DCA have been published in leading medical journals including JAMA, Lancet Oncology, Journal of Clinical Oncology, BMJ, PLoS Medicine, and Annals of Internal Medicine [9], [10], [11], [12], [13], [14], [15], [16], [17]. Importantly, evaluating NB is recommended by the TRIPOD guidelines for prediction models [18]. DCA is widely used within urology and many other clinical fields. A Web of Science search (September 11, 2018) revealed that the 2006 paper was cited 703 times in total. DCA was most often cited in journals from urology and nephrology (176 citations), oncology (147), and general and internal medicine (76). European Urology is the journal with most citations (45).
However, based on various personal discussions, we notice that researchers struggle with the interpretation and reporting of NB. We therefore aim to provide an investigators’ guide to NB and DCA. A case study on prediction of high-grade prostate cancer is used as an illustrative example.
Section snippets
Evidence acquisition
We informally reviewed the urological literature to determine investigators’ understanding of DCA. To illustrate, we use data from 3616 patients to develop risk models for high-grade prostate cancer (n = 313, 9%) to decide who should undergo a biopsy. The baseline model includes prostate-specific antigen (PSA) and digital rectal examination; the extended model adds two predictors based on transrectal ultrasound (TRUS).
Case study: prediction of high-grade prostate cancer to decide who to biopsy
Screening with PSA results in overdiagnosis of indolent prostate cancer [19]. Risk calculators have been developed for high-grade prostate cancer [20]. Using these models to decide who to biopsy can reduce unnecessary biopsies, which are aversive procedures with a risk of sepsis and lead to detection of indolent disease. Detecting high-grade prostate cancer is important, because early detection of these potentially lethal cancers can lead to curative treatment [21].
The Rotterdam Prostate Cancer
Conclusions
DCA is a statistical method to evaluate whether a model has utility in supporting clinical decisions, and which of two models leads to the best decisions. It is therefore an essential validation tool on top of measures such as discrimination and calibration.
Author contributions: Ben Van Calster had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Van Calster, Wynants, Vickers,
References (36)
- et al.
Everything you always wanted to know about evaluating prediction models (but were too afraid to ask)
Urology
(2010) - et al.
A calibration hierarchy for risk models was defined: from utopia to empirical data
J Clin Epidemiol
(2016) - et al.
Nomograms in oncology: more than meets the eye
Lancet Oncol
(2015) - et al.
Screening and prostate cancer mortality: results of the European Randomised Study of Screening for Prostate Cancer (ERSPC) at 13 years of follow-up
Lancet
(2014) - et al.
Detection of high-grade prostate cancer using a urinary molecular biomarker-based risk score
Eur Urol
(2016) - et al.
EAU-ESTRO-SIOG guidelines on prostate cancer. Part 1: screening, diagnosis, and local treatment with curative intent
Eur Urol
(2017) - et al.
A risk-based strategy improves prostate-specific antigen-driven detection of prostate cancer
Eur Urol
(2010) - et al.
Prediction of prostate cancer risk: the role of prostate volume and digital rectal examination in the ERSPC risk calculators
Eur Urol
(2012) - et al.
Combined clinical parameters and multiparametric magnetic resonance imaging for advanced risk modeling of prostate cancer-patient-tailored risk stratification can reduce unnecessary biopsies
Eur Urol
(2017) Clinical prediction models: a practical approach to development, validation, and updating
(2009)
Critical review of prostate cancer predictive tools
Future Oncol
Calibration of risk prediction models: impact on decision-analytic performance
Med Decis Making
Understanding the value of individualized information: the impact of poor calibration or discrimination in outcome prediction models
Med Decis Making
Incorporating clinical considerations into statistical analyses of markers: a quiet revolution in how we think about data
Clin Chem
Decision curve analysis: a novel method for evaluating prediction models
Med Decis Making
Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests
BMJ
Prediction models: revolutionary in principle, but do they do more good than harm?
J Clin Oncol
Beyond the usual prediction accuracy metrics: reporting results for clinical decision making
Ann Intern Med
Cited by (575)
Critical roles of S100A12, MMP9, and PRTN3 in sepsis diagnosis: Insights from multiple microarray data analyses
2024, Computers in Biology and Medicine
- †
These authors are joint first authors.