Our research is motivated by the urgent global issue of a large population affected by retinal diseases, which are evenly distributed but underserved by specialized medical expertise, particularly in non-urban areas. Our primary objective is to bridge this healthcare gap by developing a comprehensive diagnostic system capable of accurately predicting retinal diseases solely from fundus images. However, we faced significant challenges due to limited, diverse datasets and imbalanced class distributions. To overcome these issues, we have devised innovative strategies. Our research introduces novel approaches, utilizing hybrid models combining deeper Convolutional Neural Networks (CNNs), Transformer encoders, and ensemble architectures sequentially and in parallel to classify retinal fundus images into 20 disease labels. Our overarching goal is to assess these advanced models' potential in practical applications, with a strong focus on enhancing retinal disease diagnosis accuracy across a broader spectrum of conditions. Importantly, our efforts have surpassed baseline model results, with the C-Tran ensemble model emerging as the leader, achieving a remarkable model score of 0.9166, surpassing the baseline score of 0.9. Additionally, experiments with the IEViT model showcased equally promising outcomes with improved computational efficiency. We've also demonstrated the effectiveness of dynamic patch extraction and the integration of domain knowledge in computer vision tasks. In summary, our research strives to contribute significantly to retinal disease diagnosis, addressing the critical need for accessible healthcare solutions in underserved regions while aiming for comprehensive and accurate disease prediction.
翻译:我们的研究动机源于一个紧迫的全球性问题:大量人群受视网膜疾病影响,这些疾病分布广泛,但在非城市地区尤其缺乏专业医疗资源的服务。我们的主要目标是通过开发一个全面的诊断系统来弥合这一医疗鸿沟,该系统能够仅凭眼底图像准确预测视网膜疾病。然而,我们面临着数据集有限、多样性不足以及类别分布不平衡等重大挑战。为克服这些问题,我们设计了创新策略。本研究引入了新颖方法,利用混合模型将更深的卷积神经网络(CNN)、Transformer编码器以及集成架构以串行和并行方式相结合,将视网膜眼底图像分类为20种疾病标签。我们的总体目标是评估这些先进模型在实际应用中的潜力,并重点关注提升对更广泛疾病谱系的视网膜疾病诊断准确性。值得注意的是,我们的成果已超越基线模型结果,其中C-Tran集成模型表现最佳,取得了0.9166的优异模型评分,超过了0.9的基线评分。此外,IEViT模型的实验同样展现出前景良好的结果,并提升了计算效率。我们还验证了动态斑块提取技术及领域知识整合在计算机视觉任务中的有效性。总而言之,本研究致力于为视网膜疾病诊断作出重要贡献,在满足服务不足地区对可及医疗解决方案的迫切需求的同时,力求实现全面而准确的疾病预测。