Elsevier

European Urology

Volume 74, Issue 6, December 2018, Pages 796-804
European Urology

Review – Statistics in Urology
Reporting and Interpreting Decision Curve Analysis: A Guide for Investigators

https://doi.org/10.1016/j.eururo.2018.08.038Get rights and content

Abstract

Context

Urologists regularly develop clinical risk prediction models to support clinical decisions. In contrast to traditional performance measures, decision curve analysis (DCA) can assess the utility of models for decision making. DCA plots net benefit (NB) at a range of clinically reasonable risk thresholds.

Objective

To provide recommendations on interpreting and reporting DCA when evaluating prediction models.

Evidence acquisition

We informally reviewed the urological literature to determine investigators’ understanding of DCA. To illustrate, we use data from 3616 patients to develop risk models for high-grade prostate cancer (n = 313, 9%) to decide who should undergo a biopsy. The baseline model includes prostate-specific antigen and digital rectal examination; the extended model adds two predictors based on transrectal ultrasound (TRUS).

Evidence synthesis

We explain risk thresholds, NB, default strategies (treat all, treat no one), and test tradeoff. To use DCA, first determine whether a model is superior to all other strategies across the range of reasonable risk thresholds. If so, that model appears to improve decisions irrespective of threshold. Second, consider if there are important extra costs to using the model. If so, obtain the test tradeoff to check whether the increase in NB versus the best other strategy is worth the additional cost. In our case study, addition of TRUS improved NB by 0.0114, equivalent to 1.1 more detected high-grade prostate cancers per 100 patients. Hence, adding TRUS would be worthwhile if we accept subjecting 88 patients to TRUS to find one additional high-grade prostate cancer or, alternatively, subjecting 10 patients to TRUS to avoid one unnecessary biopsy.

Conclusions

The proposed guidelines can help researchers understand DCA and improve application and reporting.

Patient summary

Decision curve analysis can identify risk models that can help us make better clinical decisions. We illustrate appropriate reporting and interpretation of decision curve analysis.

Introduction

Clinical risk prediction models are commonly developed in urology and other medical fields to predict the probability or risk of a current disease (eg, biopsy-detectable aggressive prostate cancer), or a future state (eg, cancer recurrence) [1], [2], [3]. Such models are usually evaluated with statistical measures for discrimination and calibration. Discrimination evaluates how well the predicted risks distinguish between patients with and without disease. The c-statistic is the most commonly used measure for discrimination. Calibration evaluates the reliability of the estimated risks: if we predict 10%, on average 10 out of 100 patients should have the disease [1], [4]. Assessments of calibration may include graphs and statistics such as observed versus expected ratios or calibration slopes. Although a model with better discrimination and calibration should theoretically be a better guide to clinical management [4], [5], [6], statistical measures fall short when we want to evaluate whether the risk model improves clinical decision making. Such measures cannot inform us whether it is beneficial to use a model to make clinical decisions or which of two models leads to better decisions, especially if one model has better discrimination and the other better calibration [7].

To overcome this limitation, decision-analytic measures have been developed to summarize the performance of the model in supporting decision making. We focus on net benefit (NB) as the key part of decision curve analysis (DCA), which was introduced in 2006 [8]. Editorials supporting DCA have been published in leading medical journals including JAMA, Lancet Oncology, Journal of Clinical Oncology, BMJ, PLoS Medicine, and Annals of Internal Medicine [9], [10], [11], [12], [13], [14], [15], [16], [17]. Importantly, evaluating NB is recommended by the TRIPOD guidelines for prediction models [18]. DCA is widely used within urology and many other clinical fields. A Web of Science search (September 11, 2018) revealed that the 2006 paper was cited 703 times in total. DCA was most often cited in journals from urology and nephrology (176 citations), oncology (147), and general and internal medicine (76). European Urology is the journal with most citations (45).

However, based on various personal discussions, we notice that researchers struggle with the interpretation and reporting of NB. We therefore aim to provide an investigators’ guide to NB and DCA. A case study on prediction of high-grade prostate cancer is used as an illustrative example.

Section snippets

Evidence acquisition

We informally reviewed the urological literature to determine investigators’ understanding of DCA. To illustrate, we use data from 3616 patients to develop risk models for high-grade prostate cancer (n = 313, 9%) to decide who should undergo a biopsy. The baseline model includes prostate-specific antigen (PSA) and digital rectal examination; the extended model adds two predictors based on transrectal ultrasound (TRUS).

Case study: prediction of high-grade prostate cancer to decide who to biopsy

Screening with PSA results in overdiagnosis of indolent prostate cancer [19]. Risk calculators have been developed for high-grade prostate cancer [20]. Using these models to decide who to biopsy can reduce unnecessary biopsies, which are aversive procedures with a risk of sepsis and lead to detection of indolent disease. Detecting high-grade prostate cancer is important, because early detection of these potentially lethal cancers can lead to curative treatment [21].

The Rotterdam Prostate Cancer

Conclusions

DCA is a statistical method to evaluate whether a model has utility in supporting clinical decisions, and which of two models leads to the best decisions. It is therefore an essential validation tool on top of measures such as discrimination and calibration.

Author contributions: Ben Van Calster had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Van Calster, Wynants, Vickers,

References (36)

  • S.F. Shariat et al.

    Critical review of prostate cancer predictive tools

    Future Oncol

    (2009)
  • B. Van Calster et al.

    Calibration of risk prediction models: impact on decision-analytic performance

    Med Decis Making

    (2015)
  • N. Olchanski et al.

    Understanding the value of individualized information: the impact of poor calibration or discrimination in outcome prediction models

    Med Decis Making

    (2017)
  • A.J. Vickers

    Incorporating clinical considerations into statistical analyses of markers: a quiet revolution in how we think about data

    Clin Chem

    (2016)
  • A.J. Vickers et al.

    Decision curve analysis: a novel method for evaluating prediction models

    Med Decis Making

    (2006)
  • A.J. Vickers et al.

    Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests

    BMJ

    (2016)
  • A.J. Vickers

    Prediction models: revolutionary in principle, but do they do more good than harm?

    J Clin Oncol

    (2011)
  • A.R. Localio et al.

    Beyond the usual prediction accuracy metrics: reporting results for clinical decision making

    Ann Intern Med

    (2012)
  • Cited by (575)

    View all citing articles on Scopus

    These authors are joint first authors.

    View full text