Article Text
Abstract
Background Identification of futile recanalisation following endovascular therapy (EVT) in patients with acute ischaemic stroke is both crucial and challenging. Here, we present a novel risk stratification system based on hybrid machine learning method for predicting futile recanalisation.
Methods Hybrid machine learning models were developed to address six clinical scenarios within the EVT and perioperative management workflow. These models were trained on a prospective database using hybrid feature selection technique to predict futile recanalisation following EVT. The optimal model was validated and compared with existing models and scoring systems in a multicentre prospective cohort to develop a hybrid machine learning-based risk stratification system for futile recanalisation prediction.
Results Using a hybrid feature selection approach, we trained and tested multiple classifiers on two independent patient cohorts (n=1122) to develop a hybrid machine learning-based prediction model. The model demonstrated superior discriminative ability compared with other models and scoring systems (area under the curve=0.80, 95% CI 0.73 to 0.87) and was transformed into a web application (RESCUE-FR Index) that provides a risk stratification system for individual prediction (accessible online at fr-index.biomind.cn/RESCUE-FR/).
Conclusions The proposed hybrid machine learning approach could be used as an individualised risk prediction model to facilitate adherence to clinical practice guidelines and shared decision-making for optimal candidate selection and prognosis assessment in patients undergoing EVT.
- Stroke
- Risk Factors
- Thrombectomy
Data availability statement
Data are available on reasonable request.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
WHAT IS ALREADY KNOWN ON THIS TOPIC
Several outcome prediction models have been developed for acute ischaemic stroke (AIS) patients undergoing endovascular therapy (EVT). However, identification of unfavourable outcomes after successful recanalisation remains a challenge. While existing models and risk scores have the potential to facilitate patient selection and prognostication, their application in daily clinical practice is hampered by various limitations.
WHAT THIS STUDY ADDS
This study introduces a hybrid machine learning-based risk stratification tool that incorporates all relevant features of the general EVT workflow and perioperative management to accurately predict futile recanalisation in AIS patients undergoing EVT within the time frame from emergency room arrival to 24 hours post-EVT.
HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY
The proposed model carries the potential to provide clinicians and researchers with a simple automated tool for optimal candidate selection and prognosis assessment in patients undergoing EVT, and to facilitate adherence to clinical practice guidelines and aid in informed decision-making processes.
Introduction
Endovascular therapy (EVT) is a very effective treatment for acute ischaemic stroke (AIS) with large-vessel occlusion (LVO), revolutionising AIS treatment strategies worldwide. However, in clinical practice, approximately half of the patients could not achieve favourable outcomes despite successful endovascular recanalisation, signifying futile recanalisation.1 2
Current guidelines3 4 do not include information on estimating the risk of futile recanalisation following EVT in AIS. A powerful tool predicting futile recanalisation after EVT is in urgent need in order to better guide the treatment decision and provide prognostic information for the patient and family. Predictors of futile recanalisation5 can be used to evaluate the necessity of pursuing EVT.
Over the past decade, several outcome prediction models have been developed for AIS patients undergoing EVT.6 However, identification of unfavourable outcomes after successful recanalisation is still challenging, and only a limited number of models7 can predict futile recanalisation based on the information available following successful recanalisation. Hence, further studies are needed to build accurate predictive models for futile recanalisation. While the currently available models and risk scores have the potential to facilitate candidate selection and prognosis assessment, their utilisation in daily clinical practice is hampered by various limitations, including issues related to generalisability, impact analysis and routine evaluation of robust predictors.
In comparison with traditional modelling methods such as statistical logistic regression, machine learning (ML) offers a range of algorithms that are free from linear assumptions and can handle collinearity with regularisation hyperparameters. ML has demonstrated effectiveness in modelling multifactorial events in various fields, including bioinformatics. Hybrid ML algorithms8 9 exhibit much greater scalability, as they accommodate a large number of features and parameters within the models.10 Traditional ML is prone to fall into the curse of dimensionality. If there are adundant features, a large amount of data would be needed to properly train the model; otherwise, the chance of overfitting the model would be high. Hybrid ML shows the potential to overcome these issues. Thus, hybrid ML constitutes a promising method for outcome prediction and may be superior to the classical logistic regression and ML models.
Therefore, the current study aims to develop a hybrid ML-based risk stratification system for the prediction of futile recanalisation using clinical, imaging and treatment data of AIS patients undergoing EVT from a large EVT cohort and evaluate its performance in two validation sets (cross internal validation and external validation). We hypothesise that hybrid ML can capture high-dimensional, non-linear relationships among multimodal clinical features, and the hybrid ML-based risk stratification system can predict futile recanalisation in individual patients more accurately compared with existing risk scores. The proposed model has the potential to provide clinicians and researchers with a simple automated tool for optimal candidate selection and prognosis assessment in patients undergoing EVT, and to facilitate adherence to clinical practice guidelines and aid in informed decision-making processes.
Methods
In brief, six hybrid ML prediction models corresponding to six clinical scenarios in the workflow of EVT and perioperative management were developed. These models were constructed using data from patients enrolled in the Registration study for Critical Care of Acute Ischaemic Stroke after Recanalisation (RESCUE-RE) registry (n=1218).11 The optimal models were validated in a multicentre prospective cohort (n=263). The two cohorts were completely independent. The primary outcome was futile recanalisation, which was defined as an unfavourable 90-day outcome, namely a modified Rankin Scale (mRS)≥3, despite successful reperfusion (a modified Thrombolysis in Cerebral Infarction (mTICI) grades 2b–3 reperfusion flow after EVT). Hybrid ML models were constructed to predict futile recanalisation, in which, variables were selected using the two-stage feature selection pipeline (TFSP). In the first stage, paired t-test was used for continuous data and χ2 test was used for categorical data to obtain statistically significant features. Then, the statistical features were combined with the global optimisation capabilities of genetic algorithm, with the aim of obtaining superior feature subsets that address a broader range of problems. These features were ranked based on their importance or performance in classification, and the top 10 features were selected. Subsequently, a hybrid ML-based risk stratification system was developed for the prediction of futile recanalisation, following a structured research process consisting of five steps: data preprocessing, feature selection, model training, performance evaluation and external validation, as shown in figure 1.
Derivation cohort
From July 2018 to May 2019, AIS patients treated with EVT within 24 hours of stroke onset who met the following criteria were enrolled into the derivation cohort used for the development of the proposed models from RESCUE-RE: (1) age >18 years old; (2) AIS diagnosed based on cerebral imaging with documented LVO in the anterior or posterior circulation, which was confirmed by CT angiography or MR angiography (intracranial carotid artery, middle cerebral artery, anterior cerebral artery, basilar artery and vertebral artery); (3) successfully recanalised, defined as mTICI of 2b or 3 after EVT; (4) prestroke mRS score ≤2 and (5) patients were followed up for 3 months. Details of RESCUE-RE have been previously reported.
Variables and outcomes measurement
Variables used in the current study included baseline characteristics and details on workflow, EVT procedure, perioperative treatment, imaging and clinical outcomes. National Institutes of Health Stroke Scale (NIHSS) score at admission was used to assess stroke severity. The primary outcome was poor functional outcome despite successful recanalisation, defined as a dichotomised mRS score of 3–6 at 90 days.
Model definition
The current study developed six models corresponding to six clinical scenarios in the workflow of EVT and perioperative management (online supplemental figure 1). The predictors were clustered in six models (models 1–6) according to the time of acquisition from emergency room (ER) arrival to 24 hours post-EVT: baseline parameters from primary clinical evaluation after ER arrival, baseline imaging parameters, baseline laboratory indexes, initial digital subtraction angiography (DSA) parameters, EVT procedure parameters, postprocedural and post-reatment parameters (online supplemental table 1). These six models encompassed the following time points: on ER arrival, after initial imaging evaluation, after initial laboratory tests, after initial DSA, immediately after EVT and 24 hours post-EVT, respectively. Each group of predictors was incrementally added to the previous model, with model 1 including only baseline patient data. This process yielded six models with increasing extensiveness.
Supplemental material
The developed models were internally validated in the derivation cohort. Subsequently, the most extensive model was validated in an external dataset and used to develop a hybrid ML-based risk stratification system for futile recanalisation prediction.
Model development
As shown in figure 1, the research process involved five key steps: data preprocessing, feature selection, model training, performance evaluation and external validation.
Data preprocessing
The initial step involved manual variable selection, where primary and critical clinical variables were chosen from a vast pool of thousands based on expert judgement and their availability within the derivation cohort. In the selection process, priority was given to causal and modifiable factors. The RESCUE-RE study contained many variables with similar semantic information, such as the location of responsible vessels. Moreover, there were also low-level semantic information variables that could be integrated into higher-level variables. Variables with ≥20% missing values were excluded from further analysis. Multivariate imputation via chained equations was used to impute missing values. Categorical variables were binarised with one-hot encoding. Afterwards, these variables were normalised to scale all individual features to a unit norm.
Feature selection
The current study adopted a TFSP12 for feature selection. In the first stage, paired t-tests were used for continuous data, and χ2 tests for categorical data, with significance set at p<0.05, to identify statistically significant features. Subsequently, different feature selection methods were compared and the best one was chosen to rank the statistically significant features (p<0.05). Finally, multiple decision trees in a random forest (RF) are used to assess the importance of features. We decided to remain the principal components that importance more than 1%. Ultimately, we selected the 10 features. Statistical features are combined with genetic algorithms to obtain a subset of superior features.
Model training
The multistage predictive model was developed with selected features based on a 10-fold nested CV framework that was composed of an outer CV loop and an inner CV loop. In addition, during the nested CV, five baseline ML algorithms were used, including RF, gcForest, support vector machines, extreme gradient boosting and k-nearest neighbour. These models were optimised with hyperparameters using genetic algorithms. All ML algorithms have been implemented using prebuilt approaches available in the Python module Scikit-Learn (V.0.24.2). Details of the model training process are presented in online supplemental material.
Performance evaluation
The area under the receiver operating characteristics curve (AUC), F-score and log loss were used to compare the performance of different models. Moreover, sensitivity, specificity, accuracy and precision were also considered as auxiliary indicators for the general evaluation of the forecasting model characteristics. To assist doctors in the clinical decision-making process, two cut-off thresholds were set up for the predictive probability [Registration study for Critical Care of Acute Ischemic Stroke: Prediction of Futile Recanalisation (RESCUE-FR) Index] of the proposed model (RESCUE-FR Model) to divide the patient population into the low-risk group, intermediate-risk group and high-risk group.
External model validation
In order to avoid overfitting, the model effectiveness and generalisation were verified with an independent multicentre prospective cohort comprising acute stroke patients with LVO who received EVT at four stroke centres. Following inner validation, the top 10 features with the highest frequency and the hyperparameters with the largest AUC across all internal CV folds were selected as new model inputs. These inputs were then used to retrain the model in the derivation cohort, and its performance was evaluated in the validation cohort. Similarly, the accuracy of the risk stratification was validated in the validation cohort following the same risk stratification method used in the derivation cohort. Meanwhile, the performance of the RESCUE-FR model in previous research was compared with its performance in the validation cohort. The performance of the RESCUE-FR model in the previous research was calculated according to the risk score formula provided by the research articles.
Statistical analysis
Continuous variables were presented as mean±SD or medians (IQR), depending on the distribution of the variable. Categorical variables were presented as numbers (percentages). Student’s t-tests or Mann-Whitney U tests were used for the comparison of continuous variables. χ2 tests or Fisher’s exact tests were used for the comparison of categorical variables. In the derivation cohort, the TFSP in feature selection included two steps: selection of variables with p<0.05 and selection of the top 10 important features using feature importance ranking. Comparisons of AUC were performed using the DeLong test. A two-sided p<0.05 was considered statistically significant. All statistical analyses were performed by using SAS software V.9.4 (SAS Institute).
Results
Patient characteristics
A total of 945 patients (65±12 years, 605 (64%) males) from the RESCUE-RE registry11 who met the inclusion criteria were included in the final derivation cohort. A total of 177 eligible patients (64±13 years, 122 (69%) males) who underwent EVT from October 2020 to September 2021 were prospectively enrolled and included in the validation cohort (online supplemental figure 2). After a 90-day follow-up, 510 (54%) patients in the derivation cohort had unfavourable outcomes, and there were 107 (60%) cases of futile recanalisation in the validation cohort. Baseline patient characteristics, workflow and essential outcome features were comparable between the derivation and validation cohorts (table 1). Online supplemental table 2 provides additional details on both cohorts.
Model performance in different models using all features
There were about 2000 variables in the RESCUE-RE study database. After manual selection and data preprocessing, 101 variables were analysed in feature extraction across 6 models according to the time of acquisition. The performance of the five ML methods, which allowed for integrative modelling of all baseline and peri-interventional features for predicting futile recanalisation, is illustrated in online supplemental figure 3 and online supplemental table 3. In short, models 1–6, using the best-performing ML method, predicted futile recanalisation in the derivation cohort with AUCs ranging from 0.71 to 0.86 and accuracies from 0.65 to 0.78.
Feature extraction of the models
For the feature importance analysis, TFSP was employed to determine the rankings of the 101 features. The statistically significant (p<0.05) features selected in the first stage are shown in online supplemental table 4. The performance comparisons of similarity models with varying number of features showed that the performance of the model might not decrease as long as more than 10 features were included for modelling online supplemental figure 4. The top 10 features for each of the 6 models are shown in online supplemental figure 5. The feature importance evaluation showed that a total of 10 features had more than 1% importance (online supplemental figure 6). To explore the importance of genetic algorithms in feature selection, the ablation experiment is depicted in online supplemental table 5
Model performance in different models using TFSP (10 features)
The futile recanalisation prediction performance of 6 models using TFSP (10 features) with 5 ML methods in the derivation cohort is illustrated in figure 2 and online supplemental table 2. Among the evaluated ML classifiers, RF and gcForest achieved the largest AUCs for the prediction of futile recanalisation across all clinical scenarios. In summary, models 1–6, using the best-performing ML method, predicted futile recanalisation in the derivation cohort with AUCs ranging from 0.71 to 0.85 and accuracies from 0.63 to 0.78. Notably, model 6, when using the RF algorithm, significantly outperformed the other models across all ML algorithms, with an AUC of 0.85 (95% CI 0.78 to 0.93).
The optimal models using TFSP (10 features) demonstrated comparable prediction and discrimination abilities to those using all features in model 1, model 2 and model 6 (online supplemental figure 7). However, models 3–5 showed improved performance when using all features.
External validation and comparison of the RESCUE-FR model with the pre-existing models and scores
The performance of model 6 using TFSP (10 features) in the validation cohort is shown in table 2. Model 6 achieved the best results in the validation cohort, with an accuracy of 75%, a recall (sensitivity) of 78%, a specificity of 71%, an F-score of 0.75 and an AUC of 0.80 (0.73–0.87).
The performance of the RESCUE-FR Model (model 6) was compared with that of the previous conventional ML methods, statistical methods and risk scores using all available parameters for predicting futile recanalisation in the validation cohort. The model proposed in the current research significantly outperformed the others, with an AUC of 0.80 compared with AUCs ranging from 0.56 to 0.71 (p<0.05) (figure 3).
ML-based risk stratification: RESCUE-FR index
Based on the predicted probability of futile recanalisation generated by the RESCUE-FR model, appropriate thresholds need to be chosen to classify patients into three groups (low risk, intermediate risk and high risk). We calculated the number of patients with poor outcomes (mRS:3–6) and good outcomes (mRS:0–2) in the derivation cohort when the predicted probability was less than 0.1, 0.2, 0.25, 0.3, 0.35, 0.4, 0.5 and greater than 0.4, 0.5, 0.6, 0.65, 0.7, 0.75, 0.8 (online supplemental tables 6,7). We selected thresholds that would divide one-third of the population (945/3, 315 patients) into each risk category: 0.35 as low-risk threshold and 0.7 as high-risk threshold. By applying this risk stratification standard (low risk <35%, intermediate risk 35%–70% and high risk >70%), we achieved classification accuracies of 92.41% for low risk and 98.73% for high risk in the derivation cohort. In the validation cohort, the classification accuracies for low risk and high risk were approximately 80% (online supplemental figure 8). The proposed RESCUE-FR model has been transformed into a web application (RESCUE-FR index) that provides predictions of futile recanalisation for individual AIS patients who underwent EVT based on the ten features used in model 6. This web application shows the predicted probability and risk stratification with bar charts (figure 4). This web application is accessible online at fr-index.biomind.cn/RESCUE-FR/.
Discussion
This study developed and tested a hybrid ML-based risk stratification tool that took into account all relevant features of the general workflow of EVT and perioperative management. This tool accurately predicts futile recanalisation in AIS patients undergoing EVT within the time frame from ER arrival to 24 hours post-EVT. Among the evaluated ML classifiers, the performance of RF and gcForest emerged as the top performers. Consequently, RF was used to create the RESCUE-FR Model. With a larger AUC in the external validation, the RESCUE-FR model significantly outperformed other currently available models and risk scores and identified a high-risk group characterised by a smaller size and a higher proportion of futile recanalisation cases. In addition, an online calculator (available at fr-index.biomind.cn/RESCUE-FR/) was developed to enable the convenient, interactive and personalised calculation of futile recanalisation probability in AIS patients undergoing EVT. The innovation in our study is reflected in the unique feature screening method and meticulous model optimisation (hybrid ML), which goes beyond the mere utilisation of existing ML models. Additionally, our model’s capacity to make multitime point predictions holds great relevance to clinical application. It is noteworthy that our study achieved the best results when compared with similar existing studies.
A major limitation of existing risk scores is their limited reliability and effectiveness in the risk assessment at the individual patient level, as demonstrated by the difficulty in extrapolating outcome estimates from large clinical trials.13 However, individual prognostication is essential in developing appropriate personalised treatment plans and making critical medical decisions across various clinical scenarios. The simultaneous interpretation of multiple risk predictors for a single patient poses a formidable challenge for clinicians.
Our study demonstrated that hybrid ML can effectively address these challenges by leveraging complex higher-level interactions among numerous clinical features. Consequently, the proposed RESCUE-FR model exhibited improved discrimination and prediction capability for futile recanalisation compared with existing risk scores and models. Moreover, the RESCUE-FR model showed the potential to identify patients with a constantly increasing risk of futile recanalisation throughout the entire peri-EVT period, a scenario not explicitly addressed in existing clinical practice guidelines. The current study rigorously evaluated ML algorithms using different classifiers and explored a wide range of hyperparameters. Among the evaluated algorithms, the RF classifier outperformed the others, consistent with previous studies using ML to predict clinical endpoints.
Commonly used clinical predictors in EVT include age and stroke severity measured by the NIHSS or the Canadian Neurological Score. CT score and Alberta Stroke Programme Early CT score are the most widely used radiology assessment tools. The current study identified several significant predictors of futile recanalisation, many of which have been previously reported as influencing factors in EVT outcomes, such as age, baseline NIHSS score, NIHSS score after 24 hours, time metric of the procedure, volume of the infarct and several laboratory test results from blood biochemistry and routine blood tests. Most models, including the RESCUE-FR Model, predict outcomes after EVT but do not include peritreatment information. While the current study initially considered peritreatment information, it was subsequently removed after feature selection. Nonetheless, such information could still be of added value to patient assessment. However, as ML models capture higher dimensional, nonlinear interactions among features at different processes and time points, it is challenging to assess the independent impact of each variable on the predicted probability of futile renormalisation. Therefore, it is possible that some important features were removed during ML feature selection process, such as baseline imaging characteristics obscured by infarct volume at 24 hours, other follow-up clinical characters and their non-linear interactions in the proposed model.
This study has several strengths and limitations to be acknowledged. Most ML-based models require a broad spectrum of input variables, which might discourage clinicians from its utilisation at first glance. Therefore, the score in the current study was designed in a way that accommodate missing values and reduces the number of non-significant features. Nevertheless, the presence of multiple missing variables and feature selection may compromise its reliability. Another limitation of the proposed risk score model is the lack of an impact analysis to determine how its utilisation improves patient care and outcomes. Hence, future investigations should aim to identify treatment regimens that are specifically tailored to the different risk levels assessed by the RESCUE-FR index. Since the application of ML depends on the robustness of the database, practical use of the proposed model in patient care requires careful and structured data collection. Additionally, it is important to note the limitations associated with the small size of the validation cohort and the constraints inherent to the ML algorithms used in this study (eg, extended training time, large memory consumption, limited interpretability of RF, limited scalability and training complexity of gcForest). However, as the availability of large and structured databases becomes more prevalent in the future, addressing these limitations may become more feasible.
To our knowledge, RESCUE-FR index is the first accurate and externally validated hybrid ML algorithm with real-world applicability for guiding clinicians in the prediction of futile recanalisation. Therefore, integration of this tool into daily clinical practice may facilitate optimal candidate selection and prognostication of patients undergoing EVT.
Data availability statement
Data are available on reasonable request.
Ethics statements
Patient consent for publication
Ethics approval
This study involves human participants and the study protocol was approved by the medical ethics committee of the Beijing Tiantan Hospital, Capital Medical University (No. KY2018-057-01). Participants gave informed consent to participate in the study before taking part.
Supplementary material
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Footnotes
Contributors LL, TL and ZM conceived the idea and supervised the research. YJ, ZL, ZW and XM supervised the dataset creation and annotation. YW, XLi and JChen supervised feature extraction. JY, DL, GG, JCheng, XLiu and TZ developed the ML algorithms. ZY, MW, WG, YP and HY performed statistical analyses and interpreted findings. The manuscript was written by XN and comments from all other coauthors. LL is responsible for the overall content as guarantor.
Funding This study was supported by National Natural Science Foundation of China (82001920, 82071301, 81820108012) and Beijing Municipal Administration of Hospitals’ Youth Programme (QML20210503).
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.