The early detection of glaucoma is essential in preventing visual impairment. Artificial intelligence (AI) can be used to analyze color fundus photographs (CFPs) in a cost-effective manner, making glaucoma screening more accessible. While AI models for glaucoma screening from CFPs have shown promising results in laboratory settings, their performance decreases significantly in real-world scenarios due to the presence of out-of-distribution and low-quality images. To address this issue, we propose the Artificial Intelligence for Robust Glaucoma Screening (AIROGS) challenge. This challenge includes a large dataset of around 113,000 images from about 60,000 patients and 500 different screening centers, and encourages the development of algorithms that are robust to ungradable and unexpected input data. We evaluated solutions from 14 teams in this paper, and found that the best teams performed similarly to a set of 20 expert ophthalmologists and optometrists. The highest-scoring team achieved an area under the receiver operating characteristic curve of 0.99 (95% CI: 0.98-0.99) for detecting ungradable images on-the-fly. Additionally, many of the algorithms showed robust performance when tested on three other publicly available datasets. These results demonstrate the feasibility of robust AI-enabled glaucoma screening.
翻译:青光眼的早期检测对于预防视力损伤至关重要。人工智能(AI)可被用于以经济高效的方式分析彩色眼底照片(CFP),从而提升青光眼筛查的可及性。尽管用于CFP青光眼筛查的AI模型在实验室环境中展现出令人鼓舞的结果,但受限于分布外图像和低质量图像的存在,其实际场景中的性能显著下降。为解决此问题,我们提出了面向鲁棒青光眼筛查挑战(AIROGS)。该挑战包含约11.3万张图像的大规模数据集,这些图像来自约6万名患者及500个不同的筛查中心,旨在鼓励开发对不可分级图像和意外输入数据具有鲁棒性的算法。本文评估了来自14个团队的解决方案,发现最佳团队的表现与一组20名资深眼科医生和验光师相当。得分最高的团队在实时检测不可分级图像方面实现了0.99的受试者工作特征曲线下面积(95%置信区间:0.98-0.99)。此外,多数算法在三个其他公开数据集上的测试中展现出鲁棒性能。这些结果证实了基于人工智能的鲁棒青光眼筛查的可行性。