Automated optic disc (OD) and optic cup (OC) segmentation in fundus images is relevant to efficiently measure the vertical cup-to-disc ratio (vCDR), a biomarker commonly used in ophthalmology to determine the degree of glaucomatous optic neuropathy. In general this is solved using coarse-to-fine deep learning algorithms in which a first stage approximates the OD and a second one uses a crop of this area to predict OD/OC masks. While this approach is widely applied in the literature, there are no studies analyzing its real contribution to the results. In this paper we present a comprehensive analysis of different coarse-to-fine designs for OD/OC segmentation using 5 public databases, both from a standard segmentation perspective and for estimating the vCDR for glaucoma assessment. Our analysis shows that these algorithms not necessarily outperfom standard multi-class single-stage models, especially when these are learned from sufficiently large and diverse training sets. Furthermore, we noticed that the coarse stage achieves better OD segmentation results than the fine one, and that providing OD supervision to the second stage is essential to ensure accurate OC masks. Moreover, both the single-stage and two-stage models trained on a multi-dataset setting showed results in pair or even better than other state-of-the-art alternatives, while ranking first in REFUGE for OD/OC segmentation. Finally, we evaluated the models for vCDR prediction in comparison with six ophthalmologists on a subset of AIROGS images, to understand them in the context of inter-observer variability. We noticed that vCDR estimates recovered both from single-stage and coarse-to-fine models can obtain good glaucoma detection results even when they are not highly correlated with manual measurements from experts.
翻译:对眼底图像中的视盘(OD)和视杯(OC)进行自动分割,有助于有效测量垂直杯盘比(vCDR),这是眼科中常用于确定青光眼性视神经病变程度的生物标志物。通常,该问题采用由粗到精的深度学习算法解决:第一阶段近似定位视盘,第二阶段利用该区域的裁剪图像预测视盘/视杯掩膜。尽管这种方法在文献中被广泛应用,但尚无研究分析其对结果的真实贡献。本文利用5个公开数据库,从标准分割视角及其对青光眼评估中vCDR估计的角度,对不同由粗到精设计策略下的视盘/视杯分割进行了全面分析。分析表明,这类算法未必优于标准的多类单阶段模型,尤其是在训练集规模足够大且多样性足够丰富的情况下。此外,我们注意到粗阶段对视盘的分割效果优于精阶段,并且对第二阶段提供视盘监督对于确保精确的视杯掩膜至关重要。同时,在多数据集训练条件下,单阶段和两阶段模型均展现出与现有最优方法相当甚至更优的结果,并在REFUGE数据集的视盘/视杯分割任务中排名第一。最后,我们在AIROGS图像子集上评估了模型在vCDR预测方面的表现,并与六位眼科医生的结果进行对比,以理解观察者间变异性的影响。我们发现,即使单阶段和由粗到精模型估计的vCDR与专家手动测量结果高度不相关,它们仍能获得良好的青光眼检测效果。