Diabetic Peripheral Neuropathy (DPN) affects nearly half of diabetes patients, requiring early detection. Corneal Confocal Microscopy (CCM) enables non-invasive diagnosis, but automated methods suffer from inefficient feature extraction, reliance on handcrafted priors, and data limitations. We propose HMSViT, a novel Hierarchical Masked Self-Supervised Vision Transformer (HMSViT) designed for corneal nerve segmentation and DPN diagnosis. Unlike existing methods, HMSViT employs pooling-based hierarchical and dual attention mechanisms with absolute positional encoding, enabling efficient multi-scale feature extraction by capturing fine-grained local details in early layers and integrating global context in deeper layers, all at a lower computational cost. A block-masked self supervised learning framework is designed for the HMSViT that reduces reliance on labelled data, enhancing feature robustness, while a multi-scale decoder is used for segmentation and classification by fusing hierarchical features. Experiments on clinical CCM datasets showed HMSViT achieves state-of-the-art performance, with 61.34% mIoU for nerve segmentation and 70.40% diagnostic accuracy, outperforming leading hierarchical models like the Swin Transformer and HiViT by margins of up to 6.39% in segmentation accuracy while using fewer parameters. Detailed ablation studies further reveal that integrating block-masked SSL with hierarchical multi-scale feature extraction substantially enhances performance compared to conventional supervised training. Overall, these comprehensive experiments confirm that HMSViT delivers excellent, robust, and clinically viable results, demonstrating its potential for scalable deployment in real-world diagnostic applications.
翻译:糖尿病周围神经病变(DPN)影响近半数糖尿病患者,亟需早期检测。角膜共聚焦显微镜(CCM)可实现无创诊断,但现有自动化方法存在特征提取效率低、依赖人工先验以及数据有限等问题。本文提出HMSViT,一种新型分层掩码自监督视觉Transformer,专为角膜神经分割与DPN诊断设计。与现有方法不同,HMSViT采用基于池化的分层双注意力机制与绝对位置编码,通过在浅层捕获细粒度局部细节并在深层整合全局上下文,以较低计算成本实现高效多尺度特征提取。我们为HMSViT设计了块掩码自监督学习框架,降低对标注数据的依赖并增强特征鲁棒性;同时采用多尺度解码器融合分层特征,以完成分割与分类任务。在临床CCM数据集上的实验表明,HMSViT取得了最先进的性能:神经分割mIoU达61.34%,诊断准确率达70.40%,在参数量更少的情况下,其分割精度较Swin Transformer、HiViT等主流分层模型的优势最高达6.39%。详尽的消融实验进一步表明,相较于传统监督训练,结合块掩码自监督学习与分层多尺度特征提取能显著提升模型性能。总体而言,系列综合实验证实HMSViT能够提供优异、鲁棒且具备临床可行性的结果,展现了其在真实诊断场景中规模化部署的潜力。