A Robust Ensemble Algorithm for Ischemic Stroke Lesion Segmentation: Generalizability and Clinical Utility Beyond the ISLES Challenge

Ezequiel de la Rosa,Mauricio Reyes,Sook-Lei Liew,Alexandre Hutton,Roland Wiest,Johannes Kaesmacher,Uta Hanning,Arsany Hakim,Richard Zubal,Waldo Valenzuela,David Robben,Diana M. Sima,Vincenzo Anania,Arne Brys,James A. Meakin,Anne Mickan,Gabriel Broocks,Christian Heitkamp,Shengbo Gao,Kongming Liang,Ziji Zhang,Md Mahfuzur Rahman Siddiquee,Andriy Myronenko,Pooya Ashtari,Sabine Van Huffel,Hyun-su Jeong,Chi-ho Yoon,Chulhong Kim,Jiayu Huo,Sebastien Ourselin,Rachel Sparks,Albert Clèrigues,Arnau Oliver,Xavier Lladó,Liam Chalcroft,Ioannis Pappas,Jeroen Bertels,Ewout Heylen,Juliette Moreau,Nima Hatami,Carole Frindel,Abdul Qayyum,Moona Mazher,Domenec Puig,Shao-Chieh Lin,Chun-Jung Juan,Tianxi Hu,Lyndon Boone,Maged Goubran,Yi-Jui Liu,Susanne Wegener,Florian Kofler,Ivan Ezhov,Suprosanna Shit,Moritz R. Hernandez Petzsche,Bjoern Menze,Jan S. Kirschke,Benedikt Wiestler

Diffusion-weighted MRI (DWI) is essential for stroke diagnosis, treatment decisions, and prognosis. However, image and disease variability hinder the development of generalizable AI algorithms with clinical value. We address this gap by presenting a novel ensemble algorithm derived from the 2022 Ischemic Stroke Lesion Segmentation (ISLES) challenge. ISLES'22 provided 400 patient scans with ischemic stroke from various medical centers, facilitating the development of a wide range of cutting-edge segmentation algorithms by the research community. Through collaboration with leading teams, we combined top-performing algorithms into an ensemble model that overcomes the limitations of individual solutions. Our ensemble model achieved superior ischemic lesion detection and segmentation accuracy on our internal test set compared to individual algorithms. This accuracy generalized well across diverse image and disease variables. Furthermore, the model excelled in extracting clinical biomarkers. Notably, in a Turing-like test, neuroradiologists consistently preferred the algorithm's segmentations over manual expert efforts, highlighting increased comprehensiveness and precision. Validation using a real-world external dataset (N=1686) confirmed the model's generalizability. The algorithm's outputs also demonstrated strong correlations with clinical scores (admission NIHSS and 90-day mRS) on par with or exceeding expert-derived results, underlining its clinical relevance. This study offers two key findings. First, we present an ensemble algorithm (https://github.com/Tabrisrei/ISLES22_Ensemble) that detects and segments ischemic stroke lesions on DWI across diverse scenarios on par with expert (neuro)radiologists. Second, we show the potential for biomedical challenge outputs to extend beyond the challenge's initial objectives, demonstrating their real-world clinical applicability.

翻译：弥散加权MRI（DWI）对卒中诊断、治疗决策和预后评估至关重要。然而，图像与疾病的变异性阻碍了具有临床价值的通用人工智能算法的发展。我们通过提出一种源自2022年缺血性卒中病灶分割挑战的新颖集成算法来填补这一空白。ISLES'22提供了来自多家医疗中心的400例缺血性卒中患者扫描数据，促使研究社区开发出多种前沿分割算法。通过与领先团队的协作，我们将性能最优的算法整合为集成模型，该模型克服了单一解决方案的局限性。与单一算法相比，我们的集成模型在内部测试集上实现了更优的缺血性病灶检测与分割精度。该精度在不同影像与疾病变量间表现出良好的泛化能力。此外，该模型在临床生物标志物提取方面表现卓越。值得注意的是，在类似图灵测试的评估中，神经放射科医生持续偏好该算法的分割结果而非人工专家标注，凸显了其更高的完整性与精确性。基于真实世界外部数据集（N=1686）的验证确认了模型的泛化能力。该算法输出结果与临床评分（入院NIHSS评分与90天mRS评分）的相关性达到甚至超越专家评估结果，彰显其临床相关性。本研究提供两项关键发现：第一，我们提出了一种集成算法（https://github.com/Tabrisrei/ISLES22_Ensemble），可在多种场景下以媲美神经放射科专家的水平检测并分割DWI上的缺血性卒中病灶；第二，我们展示了生物医学挑战赛成果可拓展至初始目标之外的潜力，证明了其在真实世界中的临床适用性。