Retinal diseases (RD) are the leading cause of severe vision loss or blindness. Deep learning-based automated tools play an indispensable role in assisting clinicians in diagnosing and monitoring RD in modern medicine. Recently, an increasing number of works in this field have taken advantage of Vision Transformer to achieve state-of-the-art performance with more parameters and higher model complexity compared to Convolutional Neural Networks (CNNs). Such sophisticated and task-specific model designs, however, are prone to be overfitting and hinder their generalizability. In this work, we argue that a channel-aware and well-calibrated CNN model may overcome these problems. To this end, we empirically studied CNN's macro and micro designs and its training strategies. Based on the investigation, we proposed a no-new-MobleNet (nn-MobileNet) developed for retinal diseases. In our experiments, our generic, simple and efficient model superseded most current state-of-the-art methods on four public datasets for multiple tasks, including diabetic retinopathy grading, fundus multi-disease detection, and diabetic macular edema classification. Our work may provide novel insights into deep learning architecture design and advance retinopathy research.
翻译:视网膜疾病(RD)是导致严重视力丧失或失明的主要原因。基于深度学习的自动化工具在现代医学中辅助临床医生诊断和监测RD方面发挥着不可或缺的作用。近年来,该领域越来越多的研究利用Vision Transformer,以更多的参数和更高的模型复杂度实现了最先进的性能。然而,这种复杂且针对特定任务的模型设计容易过拟合,并阻碍其泛化能力。在本工作中,我们论证了具有通道感知且经过良好校准的CNN模型可能克服这些问题。为此,我们通过实验研究了CNN的宏观和微观设计及其训练策略。基于这一研究,我们提出了一种针对视网膜疾病开发的无新MobileNet(nn-MobileNet)。在我们的实验中,这一通用、简单且高效的模型在多个任务的四个公开数据集上超越了当前大多数最先进方法,包括糖尿病视网膜病变分级、眼底多疾病检测和糖尿病性黄斑水肿分类。我们的工作可能为深度学习架构设计提供新见解,并推动视网膜病变研究的发展。