Rethinking Glaucoma Calibration: Voting-Based Binocular and Metadata Integration

Glaucoma is an incurable ophthalmic disease that damages the optic nerve, leads to vision loss, and ranks among the leading causes of blindness worldwide. Diagnosing glaucoma typically involves fundus photography, optical coherence tomography (OCT), and visual field testing. However, the high cost of OCT often leads to reliance on fundus photography and visual field testing, both of which exhibit inherent inter-observer variability. This stems from glaucoma being a multifaceted disease that influenced by various factors. As a result, glaucoma diagnosis is highly subjective, emphasizing the necessity of calibration, which aligns predicted probabilities with actual disease likelihood. Proper calibration is essential to prevent overdiagnosis or misdiagnosis, which are critical concerns for high-risk diseases. Although AI has significantly improved diagnostic accuracy, overconfidence in models have worsen calibration performance. Recent study has begun focusing on calibration for glaucoma. Nevertheless, previous study has not fully considered glaucoma's systemic nature and the high subjectivity in its diagnostic process. To overcome these limitations, we propose V-ViT (Voting-based ViT), a novel framework that enhances calibration by incorporating disease-specific characteristics. V-ViT integrates binocular data and metadata, reflecting the multi-faceted nature of glaucoma diagnosis. Additionally, we introduce a MC dropout-based Voting System to address high subjectivity. Our approach achieves state-of-the-art performance across all metrics, including accuracy, demonstrating that our proposed methods are effective in addressing calibration issues. We validate our method using a custom dataset including binocular data.

翻译：青光眼是一种不可治愈的眼科疾病，会损害视神经、导致视力丧失，并位列全球主要致盲原因之一。青光眼的诊断通常涉及眼底照相、光学相干断层扫描（OCT）和视野检查。然而，OCT的高昂成本常导致对眼底照相和视野检查的依赖，这两种方法均存在固有的观察者间差异性。这源于青光眼是一种受多种因素影响的多维度疾病。因此，青光眼诊断具有高度主观性，凸显了校准的必要性——校准旨在使预测概率与实际患病可能性保持一致。适当的校准对于防止过度诊断或误诊至关重要，这对高风险疾病尤为关键。尽管人工智能已显著提升诊断准确性，但模型的过度自信反而恶化了校准性能。近期研究已开始关注青光眼的校准问题。然而，既往研究未能充分考虑青光眼的系统性特征及其诊断过程的高度主观性。为突破这些局限，我们提出V-ViT（基于投票机制的视觉Transformer），一种通过整合疾病特异性特征来增强校准的新框架。V-ViT融合双目数据与元数据，以反映青光眼诊断的多维度特性。此外，我们引入基于蒙特卡洛Dropout的投票系统以应对高度主观性问题。我们的方法在包括准确率在内的所有指标上均达到最先进性能，证明所提方案能有效解决校准问题。我们使用包含双目数据的定制数据集验证了本方法的有效性。