Article Text

Download PDFPDF

Prediction of large vessel occlusion for ischaemic stroke by using the machine learning model random forests
  1. Jianan Wang,
  2. Jungen Zhang,
  3. Xiaoxian Gong,
  4. Wenhua Zhang,
  5. Ying Zhou,
  6. Min Lou
  1. Department of Neurology, Zhejiang University School of Medicine Second Affiliated Hospital Department of Neurology, Hangzhou, Zhejiang, China
  1. Correspondence to Dr Min Lou; lm99{at}zju.edu.cn

Abstract

Backgrounds The timely identification of large vessel occlusion (LVO) in the prehospital stage is extremely important given the disease morbidity and narrow time window for intervention. The current evaluation strategies still remain challenging. The goal of this study was to develop a machine learning (ML) model to predict LVO using prehospital accessible data.

Methods Consecutive acute ischaemic stroke patients who underwent CT or MR angiography and received reperfusion therapy within 8 hours from symptom onset in the Computer-based Online Database of Acute Stroke Patients for Stroke Management Quality Evaluation-II dataset from January 2016 to August 2021 were included. We developed eight ML models to integrate National Institutes of Health Stroke Scale (NIHSS) items with demographics, medical history and vascular risk factors to identify LVO and validate its efficiency.

Results Finally, 15 365 patients were included in the training set and 4215 patients were included in the test set. On the test set, random forests (RF), gradient boosting machine and extreme gradient boosting presented area under the curve (AUC) of 0.831 (95% CI 0.819 to 0.843), which were higher than other models, and RF presented the highest specificity (0.827). In addition, the AUC of RF was higher than other scales, and the accuracy of the model was improved by 6.4% compared with NIHSS. We also found the top five items of identifying LVO were total NIHSS score, gaze deviation, level of consciousness (LOC), LOC commands and motor left leg.

Conclusions Our proposed model could be a useful screening tool to predict LVO based on the prehospital accessible medical data.

Trial registration number NCT04487340.

  • arteries
  • stroke

Data availability statement

Data are available on reasonable request.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Introduction

Acute ischaemic stroke (AIS) with large vessel occlusion (LVO) is associated with poor functional outcomes and high mortality.1 2 Recently, mechanical thrombectomy (MT) for LVO has revolutionised the stroke treatment for most disabling strokes, but this intervention is time critical.3 Although an estimated 11%–20% of patients with AIS can have LVO, fewer than 2% receive MT in daily practice.2 4–6 A relevant reason for non­treatment is delayed hospital presentation, often resulting in exclusion from treatment due to the loss of salvageable brain tissue. Notably, most delays occur in the prehospital phase of acute stroke management.2 To achieve the goal of rapid diagnosis, transfer and treatment, several scales are currently in use for identifying LVO in the prehospital setting, including the three-item Stroke Scale (3I-SS),7 gaze–face–arm–speech–time test (G-FAST),8 Cincinnati Pre-hospital Stroke Severity scale (CPSSS),9 FAST,10 Field Assessment Stroke Triage for Emergency Destination scale (FAST-ED),11 Finnish Prehospital Stroke Scale (FPSS),12 Los Angeles Motor Scale (LAMS),13 Pre-hospital Acute Stroke Severity scale (PASS),14 Rapid Arterial Occlusion Evaluation Scale (RACE),15 Recognition of Stroke in the Emergency Room (ROSIER),16 stroke vision, aphasia, neglect assessment (VAN),17 National Institutes of Health Stroke Scale (NIHSS),18 abbreviated NIHSS, modified NIHSS (mNIHSS), shortened versions of the NIHSS (sNIHSS), sNIHSS for emergency medical services (EMSs).19

However, there are still issues influencing the predictive precision of these scales to recognise LVO. Previous studies showed that the use of published prehospital prediction scales for triage decision-making left 20% of LVO undetected, and no threshold on any scale could detect LVO with both a high sensitivity and a high specificity.3 20 21 The main reason is that the accuracy of scales for detecting LVO is related to the prevalence of symptoms, which depends on dispatch call identification and local prevalence.2 Furthermore, the algorithms of scales are all based on the hypothesis of linear correlation between input and output parameters, they ignore the intricate associations among the input parameters, and they are unable to integrate various stroke-related parameters (eg, medical history or family history) in a user-independent fashion.

Recent new and exciting advances in the applications of machine learning (ML) in many healthcare areas have inspired innovations in the development of novel ML-based LVO diagnostic technology. ML, which is non-parametric, does not impose a particular structure on the data, and has been proven to be able to capture non-linear relationships, may be an appropriate tool for identifying LVO, given that most of the medical data are nonlinear, non-normal, correlation structured and complex in nature. Therefore, we developed a predictive framework for LVO based on ML methods using prehospital accessible data including demographics, NIHSS items, medical history and vascular risk factors and then compared its diagnostic parameters with previously established prehospital prediction scales.

Materials and methods

Study population

This study came from a multicentre prospective registry, Computer-based Online Database of Acute Stroke Patients for Stroke Management Quality Evaluation (CASE-II, NCT04487340). Initiated in 2016, CASE-II was designed to examine the current status of stroke care in China, and the data would be used to help develop strategies to improve stroke care. In-hospital medical documents of consecutive stroke patients were collected through a special electronic data capture system.

We retrospectively reviewed the CASE-II dataset from January 2016 to August 2021, and enrolled consecutive AIS patients who underwent CT angiography (CTA) or time of flight MR angiography (TOF-MRA) and received reperfusion therapy within 8 hours from symptom onset, and had complete information of emergency for analysis. Patients with poor image quality due to motion artefacts were excluded. Then we split the training and test sets by time period. Patients from January 2016 to January 2021 were included in the training set, and patients from January 2021 to August 2021 were included in the test set. Details of patient characteristics and the inclusion and exclusion criteria were given in figure 1. Demographic, clinical, laboratory and imaging data at admission were recorded, including age, gender; prior antiplatelet therapy, prior anticoagulant therapy; risk factors (smoking, hypertension, atrial fibrillation (AF), diabetes mellitus, hyperlipidaemia, hyperhomocysteinaemia, coronary heart disease, congestive heart failure, history of stroke/transient ischaemic attack (TIA) and family history of cardiovascular disease); blood pressure at admission.

Figure 1

Flow chart of the study population and process. AIS, acute ischaemic stroke; CTA, CT angiography; NIHSS, National Institutes of Health Stroke Scale; TOF-MRA, time of flight MR angiography.

LVO was defined as unilateral occlusion of intracranial internal carotid artery (ICA), or M1/M2 segments of the middle cerebral artery (MCA), or basilar artery (BA) on baseline CTA or TOF-MRA. Each patient’s large vessel condition was assessed by two experienced senior neurologists, who were blinded to the study design and independently reviewed, with any disputes settled via reviewed by a third neurologist for consensus decision.

Feature selection

According to published literatures, pathophysiological consideration, and meanwhile based on the availability and convenience of items in the prehospital setting, 15 NIHSS items, age, gender, prior antiplatelet therapy, prior anticoagulant therapy, risk factors (smoking, hypertension, AF, diabetes mellitus, hyperlipidaemia, hyperhomocysteinaemia, coronary heart disease, congestive heart failure, history of stroke/TIA and family history of cardiovascular disease), and blood pressure at admission were selected at first consideration.22–25 According to the statistical analysis, features with statistical significance, which was defined as p<0.05, were then chosen as the model variables. Selection of final parameters was performed using the Scikit-Learn package in Python software.

Data analysis

We chose eight common ML models: random forests (RFs), logistic regression, Extreme Gradient Boosting (XGBoost), K-Nearest Neighbour, Ada Boosting, Gradient Boosting Machine (GBM), LightGBM and artificial neural network (ANN). We developed the models using Scikit-Learn package in Python software.26–28

Specifically, model development was based on the dataset of training set with the retained features. Ten-fold cross-validation was performed for the model derivation and internal evaluation through dividing the dataset of training set into ten mutually exclusive parts, nine of which were used as training data for the model derivation and one for evaluation as inner validation data; this process was repeated ten times to generate 10 different but overlapping training data and 10 unique validation data. In the training step, we optimised model hyperparameters with a grid search algorithm, and we did not choose the hyperparameter value until the model with the highest F1 score. During the searching process, we set the area under the curve (AUC) of receiver operating characteristic (ROC) as the score. Training of the non-linear model, which not only incorporated linear correlation between input and output parameters, but also integrated various stroke-related parameters and included the intricate associations among the input parameters, in the study was based on a dichotomisation of AIS patients into LVO versus non-LVO. By training predictive models, nonlinear combinations of the prehospital accessible data of the AIS patients could be taken into the account to predict the large artery condition. We evaluated each model in the test set and compared the predictive power with the previously established scales.

Code availability

The code used to generate results shown in this study is available from the corresponding author on request.

Statistical analysis

Patients were dichotomised into the LVO and non-LVO group. Clinical characteristics were summarised by computing the median (IQR), and differences between two groups were estimated by the t-test or Mann-Whitney U test if they were continuous variables. Categorical or binary datum was summarised by proportion (n); and differences between two groups were estimated by the Pearson χ2 test. ROC analysis was used to get the AUC of each prehospital prediction scale. The ROC-derived optimal cut-off was determined at the maximal Youden Index. Finally, we calculated sensitivity, specificity and accuracy for the prediction of LVO. All statistical analysis was performed using SPSS, V.22.0 (IBM). All comparisons were two sided, with statistical significance defined as p<0.05.

Results

Study population

Finally, 15 365 patients were included in the training set from January 2016 to January 2021, and 4215 patients were included in the test set from January 2021 to August 2021. Of the included patients, mean age was 70 (60–79) years, number of female was 7386 (37.7%), median NIHSS on admission was 6 (3–13). Baseline demographics, medical history, NIHSS and risk factors were listed in table 1. Compared with patients in the training set, those in test set were less likely to be female, have AF, have coronary heart disease, have family history of cardiovascular disease, have hyperlipidaemia, have hyperhomocysteinaemia, smoke, and more likely to have LVO and present lower baseline NIHSS score at admission (table 1).

Table 1

Comparison of clinical characteristics between cohort of the training set and test set

Model performance in the test set

Table 2 shows the AUC, sensitivity, specificity and accuracy of eight models in the testing set. RF, GBM and XGBoost presented higher AUC than other five models, and RF presented higher specificity than GBM and XGBoost. Therefore, we chose the RF model as the final prediction model.

Table 2

Comparison of eight models to predict LVO in the test set

In addition, table 3 shows the comparison of AUC, sensitivity, specificity and accuracy between RF and previously established prehospital prediction scales including 3I-SS, G-FAST, CPSSS, FAST, FAST-ED, FPSS, LAMS, PASS, RACE, ROSIER, VAN, NIHSS, mNIHSS, sNIHSS and sNIHSS-EMS. Both the AUC and accuracy of RF were better than other prehospital prediction scales. Comparing with NIHSS in the study, the accuracy of the model was improved by 6.4%.

Table 3

Comparison of various published clinical scales with RF model to predict LVO in the test set

Importance of features contributing to identification of LVO

Gini importance is a measurement of the feature importance, which is defined as the total reduction of the criterion brought by that feature. Gini importance of every risk factor was then calculated. As figure 2 shows, items incorporated in NIHSS were identified as significant contributors in the LVO estimation, and the top five items were total NIHSS score (0.2381), gaze deviation (0.1412), level of consciousness (LOC) (0.0759), LOC commands (0.0613) and motor left leg (0.0567), respectively. Interestingly, systolic blood pressure (0.0233), diastolic blood pressure (0.0217) and AF (0.0159) were found to play an important role in determining LVO, suggesting not only symptoms but medical history of patients need to be taken into account together in terms of LVO identification.

Figure 2

Illustration of features contributing to identification of LVO by Gini importance values. Gini importance is a measurement of the feature importance in the model, the higher the value of Gini importance is, the more important the feature is. LOC, level of consciousness; NIHSS, National Institutes of Health Stroke Scale.

Discussion

To our knowledge, this was the first study that used the ML model, RF, to predict LVO based on a large sample of training and test set by far. Our results showed that the diagnostic parameters of RF for the identification of LVO were significant higher than previously established prehospital prediction scales. Furthermore, we also found total NIHSS score, gaze deviation, LOC, LOC commands and motor left leg were important items for identifying LVO.

In 2015, several trials provided compelling evidences that, for stroke due to LVO, MT resulted in significantly better recanalisation and clinical outcomes than did intravenous thrombolysis alone. After that time, LVO management has been revolutionised by the evidences of the MT effectiveness. In addition, guidelines have suggested that patients with potential LVO might benefit from direct transport to a thrombectomy­capable centre, regardless of travel times.29 Unfortunately, however, 20% of LVO remain undetected although several pre-hospital prediction scales have been developed to detect LVO by far.20 Additionally, scales for detecting LVO are not effective for strokes in the posterior circulation, and patients with right LVO might be misdiagnosed when they present mild to moderate neurological deficits.8 30 The accuracy of scales for detecting LVO is related to the prevalence of symptoms, which depends on dispatch call identification and local prevalence.2 Furthermore, the algorithms of scales are all based on the hypothesis of linear correlation between input and output parameters, ignoring the intricate associations among the input parameters, so that they are unable to integrate various stroke-related parameters. Meanwhile, most of scales are derived from NIHSS, however, some certain items of NIHSS and specified patterns of combined deficits may carry a high attributable risk of LVO, but not equal to the simple sum of score.

The advantage of the RF algorithm is that it could capture nonlinear relationships, including interactions among the input parameters and outputs until reaching high accuracy. Previously, prehospital triage tools for detection of LVO were as simple as possible, focusing solely on the elements of the neurological examination, in order to be easily memorised by EMSs personnel.8 However, more and more literatures suggested that not only symptoms but also medical history such as atherosclerotic stenosis24 and AF25 were associated with LVO, thus the ignorance of various stroke-related parameters may reduce the accuracy of prediction. Our RF model contained NIHSS items and stroke-related parameters, which might be one reason for its higher predictive performance than previous scales. One might question that the assessments of these input parameters may need more time. But we noticed that most of parameters were included in routine evaluation under emergency conditions, and NIHSS was still the main assessment in clinical practice.2 Actually, in the mobile internet era when users get the intuitive outputs by easily inputting measured parameters, leaving the complex underlying algorithm to the online calculation tool or local mobile application, the ability to discriminate is more important than the simplicity of usage in LVO prediction tools.

Otherwise, we found gaze deviation, LOC, LOC commands and motor left leg incorporated in NIHSS were the most important items for identifying LVO in the study, especially for gaze deviation, whose importance was much significant than others. This finding was consistent with previous studies. Vidale and Agostoni31 have reported that the overall accuracy would be improved considering the presence of gaze in the scoring systems, and Scheitz et al8 have also reported addition of gaze deviation in scale could improve the specificity. In addition, several studies have reported significant correlations between gaze deviation, LOC, LOC questions, LOC commands, facial palsy and LVO.8 9 14 15 17 We noted that several studies reported neglect had a significant correlation with LVO,16 17 but our study did not show a significant importance. This may be potentially explained by the low incidence rate of neglect and low consistency of neglect evaluation.23 Interestingly, we found AF was more important than sensory in terms of determining LVO, which may have some potential explanations. AF allows blood to stagnate, particularly in the left atrial appendage, and both permanent and paroxysmal AF increase the risk of stroke,1 and previous literatures have reported LVO was independently associated with AF.25 32

Notably, we found the precision of our model was lower than the ANN model reported previously (0.772 vs 0.820).22 We also performed ANN model in our data to detect LVO, and found its predictive performance inferior to RF (accuracy: 0.761 vs 0.772). There were some potential reasons. First, previous cohort was small (only 600 cases) and used cross-validation to validate its predictive performance in a single cohort, which could overfit dataset and lead to more biases and wrong prediction, while our cohort was very large (19 580 cases) and used an independent cohort to evaluate the model, thus our result was more compelling. Furthermore, the populations between two studies were different. The LVO proportion in previous study was 58.3%, while the proportion in our model was 28.7% in the training set and 30.3% in the test set, which was closer to real prehospital setting (21.3%–35.9%),8 9 11 15 23 and reflected the practical rather than theoretical predictive performance.

In addition, we noted that the AUC of NIHSS reported in some literatures was higher than that in our paper, and even higher than the AUC of our model.23 33 This difference may be due to the different characteristics of study populations. First, the study populations in the previous literatures were much smaller than ours (543 and 178 vs 4215) and came from only one centre. However, our study came from a multicentre prospective registry and had a large amount of subjects, therefore, the generalisability of the conclusion in our study was higher than that reported previously. Furthermore, LVO was defined as occlusion of the ICA and of proximal segments (M1, M2) of the MCA in previous studies. However, the definition of LVO in our study also included the occlusion of BA. Obviously, the prediction performance of NIHSS to detect the occlusion of BA was low, which reduced the AUC of NIHSS to predict LVO in our study. We also noted that the accuracy and specificity of our model was higher than NIHSS, while the sensitivity of ours was slightly lower than NIHSS. Due to a relatively higher specificity, the finding that our model had higher accuracy than NIHSS could be diluted. However, there are some potential reasons to explain the rational of our model. Recently, prehospital suspected LVO screening scales with high sensitivity and low specificity were reported to lead to interfacility transfer-related delays,34 so-called ‘short cuts make long delays’, especially for patients with non-LVO. For scales with high sensitivity, too many non-LVO patients would be transferred to comprehensive stroke centres (CSCs), which would lead to CSCs being overburden and delays for thrombolytic therapy, too. Therefore, our model with slightly low sensitivity and high specificity is rational to reduce the non-LVO population mistakenly identified as LVO, and potentially reduce the interfacility transfer-related delays for non-LVO patients. Additionally, from clinical practice, the RFs model, with a relatively high specificity, can not only shunt patients, but also make patients get targeted treatment specifically.

Our study has some limitations. First, all patients were diagnosed with AIS and received reperfusion therapy. Consequently, the sensitivity and specificity of the RFs model for LVO might differ from pre-hospital cohorts with suspected stroke that include stroke mimics and haemorrhagic strokes. Thus, we cannot rule out a selection bias. However, our results still confirmed the feasibility of RFs to identify LVO using prehospital accessible data and gave a future direction for further studies to identify LVO in the prehospital setting. Further studies performed in the preclinical setting are necessary to generalise our results. Second, the retrospective and observational design inherits potential for bias. Our results need to be confirmed prospectively in the prehospital emergency cohorts.

Conclusions

In conclusion, our study illustrated the ML model, RF, could be a useful screening tool with an excellent accuracy to predict LVO for ischaemic stroke patients based on the prehospital accessible medical data.

Data availability statement

Data are available on reasonable request.

Ethics statements

Patient consent for publication

Ethics approval

The study was approved by the human ethics committee of the second affiliated hospital of Zhejiang University, School of Medicine. Clinical investigation had been conducted according to the principles expressed in the Declaration of Helsinki.

References

Footnotes

  • Contributors JW, JZ, XG, ML were involved in the design of the study. JW, WZ and YZ did the statistical analysis and wrote the first draft. All authors contributed to the further drafts. ML was responsible for the overall content as the guarantor. JW, JZ and XG contributed equally to this paper. The corresponding author attests that all listed authors meet the authorship criteria. All authors read and approved the final manuscript.

  • Funding This work was supported by the Science Technology Department of Zhejiang Province (2018C04011), the National Natural Science Foundation of China (81971101) and the National Key Research and Development Program of China (2016YFC1301503).

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.