Objective The concept of the ‘self-fulfilling prophecy’ is well established in intracerebral haemorrhage (ICH). The ability to improve prognostication and prediction of long-term outcomes during the first days of hospitalisation is important in guiding conversations around goals of care. We previously demonstrated that incorporating delayed imaging into various prognostication scores for ICH improves the predictive accuracy of 90-day mortality. However, delayed prognostication scores have not been used to predict long-term functional outcomes beyond 90 days.
Design, setting and participants We analysed data from the ICH Deferoxamine trial to see if delaying the use of prognostication scores to 96 hours after ICH onset will improve performance to predict outcomes at 180 days. 276 patients were included.
Interventions and measurements We calculated the original ICH score (oICH), modified-ICH score (MICH), max-ICH score and the FUNC score on presentation (baseline), and on day 4 (delayed). Outcomes assessed were mortality and poor functional outcome in survivors (defined as modified Rankin Scale of 4–5) at 180 days. We generated receiver operating characteristic curves, and measured the area under the curve values (AUC) for mortality and functional outcome. We compared baseline and delayed AUCs with non-parametric methods.
Results At 180 days, 21 of 276 (7.6%) died. Out of the survivors, 54 of 255 had poor functional outcome (21.2%). The oICH, MICH and max-ICH performed significantly better at predicting 180-day mortality when calculated 4 days later compared with their baseline equivalents ((0.74 vs 0.83, p=0.005), (0.73 vs 0.80, p=0.036), (0.74 vs 0.83, p=0.008), respectively). The delayed calculation of these scores did not significantly improve our accuracy for predicting poor functional outcomes.
Conclusion Delaying the calculation of prognostication scores in acute ICH until day 4 improved prediction of 6-month mortality but not functional outcomes.
Trial registration number ClinicalTrials.gov Registry (NCT02175225).
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Intracerebral haemorrhage (ICH) represents 10%–15% of all stroke subtypes, but it is the most fatal and devastating.1 Early neurological deterioration is common, often due to haematoma expansion or worsening of oedema.2 Due to these additional complications, ICH often has a slower rate of recovery than other stroke subtypes.3 A large proportion of ICH survivors continue to functionally improve, up to 1 year after the event.4 In this context, prognosticating early within the disease course is highly discouraged by current practice guidelines.5 Furthermore, early prognostication has been shown to impact outcomes in ICH, as perceived poor prognosis results in early withdrawal of care, which inevitably leads to worse outcomes. The ability to improve prognostication in ICH during the first days of hospitalisation is essential, and can help guide conversations around goals of care.
We have previously demonstrated that the use of delayed imaging in acute ICH can improve the predictive performance of multiple ICH scores for 3-month mortality.6 However, it is unclear if similar results would be observed if delayed clinical information were also incorporated, and if the same would hold true for prediction of functional outcomes. Furthermore, given accumulating evidence that the timeline for ICH recovery is longer than 3 months,7–9 it is unknown if these delayed scores would perform similarly if assessed at a later time point.
The objective of this study was to determine if delaying the calculation of multiple ICH prognostication scores10–13 to day 4 would improve their predictive accuracy for mortality and poor functional outcome at various times points, including 180 days after the event. We hypothesise that ICH scores using delayed clinical and imaging data would perform better than their baseline counterparts for predicting both mortality and poor functional outcome.
Data availability and patient consent
We analysed data from the iDEF (ICH Deferoxamine) trial.9 This trial was supported by the National Institute of Neurological Disorders and Stroke (grant number U01NS074425). Access to the data can be obtained through a formal proposal to the iDEF authors.
iDEF was a prospective, multicentre, placebo-controlled, double-blinded, phase two randomised clinical trial that assessed the safety of deferoxamine mesylate (DFO) in patients with acute ICH.9 The trial was conducted at 40 hospitals in Canada and the USA, and recruited patients between November 2014 and 2017. Patients enrolled were between the ages of 18 and 80 years, presenting with spontaneous, supratentorial ICH, and were administered either DFO or placebo within 24 hours of haemorrhage onset. Infusions were continued for 3 consecutive days (24 hours apart). Key exclusion criteria included suspected secondary causes for ICH, infratentorial location, coagulopathy (defined as international normalised ratio >1.3 or activated prothrombin time >40 s), use of direct oral anticoagulants or heparin, disability on presentation (modified Rankin Scale (mRS) of 2 or more), coma (Glasgow Coma Scale (GCS) of 6 or less) and any indication that withdrawal of care would be implemented within 72 hours. Pre-ICH cognitive impairment was ascertained based on documented history from medical records or self-reported history of cognitive impairment by the patient or family members. Further details about the inclusion/exclusion criteria can be found in the original publication.9 Baseline CT scans were obtained on arrival and on day 4 after presentation. CT scans were sent to the iDEF core imaging laboratory for analyses by blinded assessors. ICH volumes were analysed with imaging analysis software.14 Clinical assessments were done at presentation (ie, GCS or National Institutes of Health Stroke Severity Scale (NIHSS)), and within 24 hours of the last infusion. For our analysis, we further excluded patients who were missing follow-up imaging or clinical information necessary to calculate the various prognostication scores. The necessary variables for calculation and their respective weighting within each prognostication model are listed in table 1.
We calculated the ICH score,10 modified-ICH score,11 max-ICH score12 and the FUNC score13 (table 1), at initial presentation and in a delayed fashion. We calculated ‘baseline’ ICH scores using clinical assessments on initial presentation and CT scans used to confirm ICH. We calculated ‘delayed’ scores using clinical information and CT scans obtained after the last infusion of DFO or placebo. In the iDEF trial, infusions were started within 24 hours of presentation, and continued over a 3-day period, which means the delayed scans in our study were calculated at 72–96 hours after initial presentation, well after the known timeline for haematoma expansion.15 Our outcomes are mortality assessed at 30 days, 90 days and 180 days, and poor functional outcome in survivors at the same time points, defined as mRS of 4–5. We generated ‘baseline’ versus ‘delayed’ receiver operating characteristic (ROC) curves for each score, and compared diagnostic accuracy using their respective area under the curve (AUC). AUCs were compared using non-parametric methods.16 A sensitivity analysis was performed to check whether the association between score and outcome was modified by the DFO treatment. Potential interactions between treatment effect and each delayed score were assessed using logistic regression models. Sensitivity analyses for the effect of sex and ethnicity were also performed; we stratified our patient cohort by male versus female sex and Hispanic or Latino ethnicity versus non-Hispanic/Latino ethnicity. Statistical analyses were repeated for each stratification by creating ROC curves for mortality and functional outcomes at the same time points. Statistical significance was set at p<0.05. All statistical analysis was performed using SPSS V.26.0 (IBM).
The iDEF trial initially enrolled 294 patients, but 291 were included for analysis of main results due to clinical deterioration within the first 24 hours for three patients, precluding them from receiving the study drug, as per original study protocol. Eight patients were also excluded for withdrawal of consent and loss to follow-up. For our analysis, we further excluded six patients for missing the necessary clinical or radiographic information needed to perform score calculations. Within our analysis cohort, there were 277 patients total: 138 were treated with placebo, and 139 were treated with DFO (50.2%). For the max-ICH score which uses NIHSS instead of GCS, there was an additional patient who was missing clinical information, resulting in a cohort of 276 patients (see table 1 for individual score components). The majority of included patients were men (169 of 277, 61.0%) and non-Hispanic/Latino (234 of 277, 84.5%). Table 2 outlines other patient characteristics of those who were included versus excluded from analysis.
Score distributions for the ICH score, FUNC score, modified-ICH score and max-ICH score are provided in online supplemental figure I. At 180 days, 21 of 276 patients died (7.6%). Among the survivors, 54 of 255 had poor functional outcome at 180 days, defined as mRS of 4–5 (21.2%). The summary of the AUC values for each ROC curve using baseline versus delayed information is listed in table 3. ROC curves generated for 30-day, 90-day and 180-day mortality and poor functional outcome are shown in figures 1 and 2, respectively, for the original ICH score; ROC curves for the modified-ICH score, max-ICH score and the FUNC score are included in online supplemental figures II–IV.
In our patient cohort, each ICH score calculated at baseline predicted mortality and poor functional outcome at all time points measured (30 days, 90 days and 180 days), with AUC values ranging from 0.63 to 0.84 (see table 2). For mortality, the delayed calculation of the original ICH score, modified-ICH, max-ICH score and FUNC score performed better at all time points compared with their respective baseline counterparts. The difference between baseline and delayed scoring did not meet statistical significance for the FUNC score (p=0.054, p=0.058 and p=0.061 for 30-day, 90-day and 180-day mortality, respectively), and for the modified-ICH score at 90 days (p=0.080).
For poor functional outcome, delayed calculation of the ICH score and the FUNC score did not perform significantly better than their baseline counterparts at any time point (table 3). Their respective AUC values for baseline versus delayed ROC curves for poor functional outcome did not demonstrate consistent trends: 0.63 vs 0.62 at 30 days (p=0.39), 0.69 vs 0.65 at 90 days (p=0.18), and 0.68 vs 0.66 at 180 days (p=0.43). The delayed scoring of the modified-ICH and max-ICH score was superior than baseline assessment at 30 days: 0.71 vs 0.75 (p=0.03) for modified-ICH score and 0.77 vs 0.82 (p=0.011) for max-ICH score. However, no difference was detected for either score at 90 or 180 days.
We found similar results when scoring was stratified by treatment group (see online supplemental tables I and II) Neither the first-order DFO effect nor the interaction term was statistically significant in any of the models for the ICH score, modified-ICH score and the max-ICH score. A statistically significant interaction between treatment effect and FUNC score was observed at all time points (see online supplemental table III).
Lastly, a sensitivity analyses was performed to see if there was an effect of either sex or ethnicity on our results. The frequencies of our outcomes of interest are outlined in online supplemental table lV for sex, and online supplemental table V for ethnicity. The only outcome that was significantly different between men and women was the proportion of patients with ICH with poor functional outcome at 90 days (28.8% in men and 40.8% in women, p=0.049). However, this difference was not observed at 30 days and did not persist at 180 days (56.9% vs 55.0%, p=0.77 at 30 days, and 20.3% vs 26.1%, p=0.29 at 180 days). None of the outcomes of interest was significantly different between Hispanic/Latino versus non-Hispanic/Latino patients (online supplemental table V). Further breakdown of the various ICH scores and their AUC values from their respective ROC curves is outlined in online supplemental tables VI and VII for sex analysis, and online supplemental tables VIII and IX for ethnicity analysis.
By calculating multiple ICH prognostication scores at initial presentation and at 4 days after presentation, we found that delaying the calculation of most scores improved our predictive ability for mortality at all time points up to 180 days. Delayed scoring did not improve our ability to predict functional outcome among survivors.
The concept of the ‘self-fulfilling prophecy’ is well established in ICH, whereby perceived poor prognosis during the hyperacute phase of the disease may result in early care limitations and withdrawal of care. However, all ICH scores were created using parameters set by initial imaging/clinical assessments on presentation.10–13 While early reassessment in ICH has been shown to improve prognostication for 90-day outcomes,17 no previous study has looked at clinical endpoints beyond that. We hypothesise haematoma expansion and early neurological deterioration likely account for the improved performance of delayed ICH scores.18–20 The observed improvement in mortality prognostication is consistent with previous published work,6 21 22 and these results advise against early prognostication, which is in accordance with current American Heart Association guidelines.5 Our findings are important for clinical practice, as discussions around withdrawal of care in ICH typically occur days after admission, with the average decision to institute withdrawal of care orders around day 5.23 Our study suggests reassessment of ICH prognostication scores may help guide conversations around goals of care.
Interestingly, improvements in the ability to predict poor functional outcome were not observed among survivors using any of the ICH scores past 30 days. There is evidence to suggest that the recovery process for ICH survivors is much slower than ischaemic stroke, and recovery may occur throughout the first year after the initial insult.4 It is possible that we may still be measuring functional outcomes too early at 6 months. Furthermore, predicting functional outcomes is much more nuanced and complex than predicting mortality, which is a simple dichotomous outcome. Lastly, functional outcomes are often dependent on premorbid status and socioeconomic supports.13 24
Our sensitivity analysis for outcome differences between sex revealed that there were no significant differences between men and women for mortality at any time point (online supplemental table IV), which is similar to results from previous studies that examined sex-specific outcomes in ICH.25 26 While more women were disabled at 90 days compared with men among survivors in our study (40.8% vs 28.8%, p=0.049), this difference was not observed at 30 or 180 days. It is possible that women may have a different rate of recovery compared with men, or this may be due to chance given relatively small sample size of female participants, and larger studies are warranted. There were no differences in outcomes between Latino/Hispanic patients versus non-Latino/non-Hispanic patients in our study (online supplemental table V). We also calculated the AUC values for ROC curves of baseline versus delayed prognostication scores when patients were stratified by sex and ethnicity (online supplemental tables VI–IX). Similar trends were observed in male patients and non-Latino/Hispanic patients, where delayed ICH, modified-ICH and max-ICH scores performed better than baseline for predicting mortality, but not functional outcomes past 30 days (online supplemental tables VI and IX). However, this trend was not observed in women where only prediction of 30-day mortality was improved with delayed calculations (online supplemental table VII). Similarly, there was no statistical difference between delayed versus baseline scores in predicting mortality or functional outcome in Latino/Hispanic patients (online supplemental table VIII). We speculate these differences may be related to sample size, as the majority of the patients in our included cohort were men, and non-Hispanic/Latino. There may be differences in outcomes between ethnic groups,27 28 but larger epidemiological studies may be warranted.
Of note, it is interesting that although the FUNC score was created to predict functional independence among ICH survivors, it possessed the lowest prognostic performance in all of our analyses. This may be because the FUNC score was derived in a different patient population from ours: on average, the patients in iDEF were younger, had smaller ICH volumes and had less cognitive impairment (only 5 of 293 cases). It is also the only score we tested that had a significant first-order treatment effect and positive interaction with the score itself. This illustrates the point that predictive scales are often subject to substantial unexplained variance.29 In fact, Hwang et al have shown that early (within 24 hours) clinician judgement of prognosis correlated more closely with 3-month outcomes after ICH than prognostic scales, including the ICH score and the FUNC score.30 These findings together with the results of our study reinforces that routine use of prognostic scales for making clinical decisions for individual patients should be avoided, particularly early in the disease course. Furthermore, the context within which these prognostication scores are applied is important, as prognostication should never be provided without the guidance of an experienced care team.29–31
Our study has important limitations. While our study supports waiting until at least day 4 to prognosticate, it does not mean a decision needs to be made at that time. Waiting even longer to make decisions regarding withdrawal of care may be appropriate in specific clinical scenarios—for example, in the case of significant mass effect during early days. Next, the iDEF trial only included patients with supratentorial haemorrhages who were not on anticoagulation and had milder clinical presentations (ie, GCS >6), thus skewing our scores towards a patient population with mild–moderate ICH (see online supplemental figure I). More specifically, the results of our study cannot be extrapolated to infratentorial haemorrhages, as we did not have access to data from this subpopulation in the current analysis. Therefore, our results are difficult to generalise to a general patient population with ICH, as a clinical trial population naturally selects for patients who are less clinically severe compared with an unselected pragmatic population. Conversely, our patient population provided well-characterised clinical and radiological data with multiple time points for outcome assessments which allowed for a thorough and novel assessment of multiple prognostication scores. While an interaction was observed between treatment effect and FUNC score, we suspect that this interaction is statistical in nature, given the lack of interaction with the remaining scores and the relative similarities in the individual components that make up each score.
Delaying the calculation of prognostication scores in patients with acute ICH improves our ability to predict mortality, but did not improve performance for predicting long-term functional outcome among survivors. Clinicians should continue to be cautious when using prognostication scores to guide conversations around goals of care in ICH.
Contributors RL and DD are responsible for the conception and design of the study. MS contributed to the data acquisition and interpretation of data. TR contributed to data analysis and interpretation of data. RL and VY performed the data analysis and interpreted the data. RL drafted the manuscript. All listed authors helped revise the work for important intellectual content, and gave their final approval for the version published.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests MHS reports grants from NINDS during the conduct of the study. DD is supported by the Heart and Stroke Foundation of Canada Clinician-Scientist Award, and has a patent for computerised automated recognition of leakage software.
Patient consent for publication Not required.
Ethics approval All patients enrolled in iDEF gave written informed consent according to the requirements of their respective local research ethics board.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement Data are available upon reasonable request. Data are available upon reasonable request from the authors of the original iDEF Study.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.