Discussion
In this study, we developed a multitask DL algorithm capable of detecting ischaemic lesions of any type and in any brain location using 5772 CT brain scans collected from patients who had stroke and labelled but not annotated for lesion location/extent. Our best-performing method achieved an accuracy of 72% in correctly detecting ischaemic lesions and performed better on follow-up scans compared with baseline scans which is consistent with human performance.
We also investigated the impact of lesion location, lesion type, lesion size and background brain changes on the performance of our DL system. However, training a DL model requires a large number of examples.22 23 In our study, the distribution and type of ischaemic lesions commonly encountered were highly skewed with most cases showing lesions caused by large-medium vessel occlusion affecting the MCA territory of the brain. As a result, our algorithm was less successful in detecting less frequently occurring lesions such as brain stem lesions, lacunar lesions and cerebellar lesions which had fewer example cases. Furthermore, some ischaemic lesions are much smaller than others affecting the performance of our model.
We also analysed four types of background brain changes and found that our DL system had the highest classification error for scans with old stroke lesions and scans with other lesion types not related to stroke. However, a balanced data set where each feature is represented equally would be required to determine the importance of DL system confounding by specific acute lesions or background brain changes. Further studies in the future are needed to address this issue.
The average agreement between our algorithm and seven experts was relatively low compared with the agreement among the seven experts. There are likely multiple reasons for this. First, ground truth is not always obtainable in medical imaging and our analysis was based on a clinical gold standard reference that was qualitatively assessed by a single expert which is known to be imperfect and influenced heavily by clinician experience. In other words, our DL system learnt from the best available data but the data were imperfect. Second, the expert agreement data we used included both CT and corresponding CTA data for each patient while our DL method only used the CT images. The addition of CTA makes it more likely for our experts to reach the correct answer (and thus agree) for each scan. In fact, using data from a separate analysis, we observed lower agreement among experts when only CT images were provided which was more similar to our expert-DL agreement.
Early detection of ischaemic stroke is important for improving patient outcomes due to the time-sensitive nature of available treatments. Accurate early detection influences several aspects of acute stroke management such as appropriate patient prioritisation in emergency settings, selection for treatment with thrombolysis and/or thrombectomy and the early initiation of secondary prevention measures which can reduce the risk of recurrent strokes.
However, despite its importance, early detection of ischaemic stroke presents several challenges such as the subtle presentation of early-stage lesions and the presence of stroke mimics and chameleons as demonstrated in a previous analysis of a commercially available tool which revealed many shortcomings.21
The superior performance of our model on follow-up scans compared with baseline scans aligns with human diagnostic behaviour. Moreover, follow-up predictions still provide significant value in stroke management and patient care. They enable clinicians to evaluate the effectiveness of initial treatments and thereby better predict outcomes or plan additional interventions such as hemicraniectomy.
Since some stroke-related complications may not manifest during the acute phase, follow-up predictions can aid in detecting delayed issues such as cerebral swelling, haemorrhagic transformation or other secondary events. Additionally, tracking lesion progression offers valuable insights for shaping rehabilitation strategies allowing for personalised therapies based on the patient’s evolving condition and recovery potential.
Interpretability of DL models, particularly in the context of medical imaging, is a challenging topic due to the so-called ‘black box’ nature of these models. However, understanding how these models arrive at their decisions is critical for ensuring their reliability and detecting any potential biases.24 To address this issue, we employed counterfactual explanations and generated saliency maps that highlight the most relevant parts of the images for our model’s output. Our saliency maps showed that our DL algorithm was able to detect obvious ischaemic lesions with high accuracy while also indicating that the model was less certain about the location of more subtle lesions and may highlight regions outside the true lesion. This behaviour is consistent with that of humans.
Other authors employed a two-stage network to combine local and global information for ischaemic stroke lesion detection25 obtaining 87% accuracy. However, in addition to CT scans they also employed Diffusion-weighted imaging (DWI) MR images (which are highly sensitive for early ischaemia and not routinely used in most centres) and their data set is composed of only 277 patients. Mirajkar et al26 also used a combination of CT and DWI images for the segmentation of stroke lesions. However, our study focuses solely on CT scans and involves a larger-scale investigation to establish a benchmark for this imaging modality. By doing so, we have tried to demonstrate that it is possible to develop future stroke detection algorithms based on routinely acquired (as opposed to optimised for research) CT imaging alone since this is the most widely used imaging modality for acute stroke.
A limitation of our study is that culprit ischaemic lesions may not be visible on CT scans, especially at baseline. This could lead to incorrect labelling of scans. Using healthy controls would have been an option but it is not ethical to scan truly normal individuals with CT due to the associated radiation. While other individuals with ‘normal’ CTs acquired for other reasons may include confounding features. The second limitation is that subgroup analyses exploring the impact of lesion location, lesion number and other chronic features suffer from small numbers of cases in many of the categories.