Generative AI for automated glaucoma diagnostic report generation faces two predominant challenges: content redundancy in narrative outputs and inadequate highlighting of pathologically significant features including optic disc cupping, retinal nerve fiber layer defects, and visual field abnormalities. These limitations primarily stem from current multimodal architectures' insufficient capacity to extract discriminative structural-textural patterns from fundus imaging data while maintaining precise semantic alignment with domain-specific terminology in comprehensive clinical reports. To overcome these constraints, we present the Dual-Attention Semantic Parallel-LSTM Network (DA-SPL), an advanced multimodal generation framework that synergistically processes both fundus imaging and supplementary visual inputs. DA-SPL employs an Encoder-Decoder structure augmented with the novel joint dual-attention mechanism in the encoder for cross-modal feature refinement, the parallelized LSTM decoder architecture for enhanced temporal-semantic consistency, and the specialized label enhancement module for accurate disease-relevant term generation. Rigorous evaluation on standard glaucoma datasets demonstrates DA-SPL's consistent superiority over state-of-the-art models across quantitative metrics. DA-SPL exhibits exceptional capability in extracting subtle pathological indicators from multimodal inputs while generating diagnostically precise reports that exhibit strong concordance with clinical expert annotations.
翻译:用于自动化青光眼诊断报告生成的生成式人工智能面临两大主要挑战:叙述性输出中的内容冗余,以及对病理学显著特征(包括视盘凹陷、视网膜神经纤维层缺损及视野异常)的强调不足。这些局限性主要源于当前多模态架构在从眼底成像数据中提取判别性结构-纹理模式的同时,难以与综合性临床报告中领域特异性术语保持精确语义对齐的能力不足。为克服这些限制,我们提出了双注意力语义并行LSTM网络(DA-SPL),这是一个协同处理眼底成像与补充视觉输入的先进多模态生成框架。DA-SPL采用编码器-解码器结构,并在编码器中增强设计了新颖的联合双注意力机制以实现跨模态特征精炼,通过并行化LSTM解码器架构提升时序-语义一致性,并配备专门的标签增强模块以生成准确的疾病相关术语。在标准青光眼数据集上的严格评估表明,DA-SPL在各项量化指标上均持续优于现有最先进模型。DA-SPL展现出从多模态输入中提取细微病理指标的卓越能力,同时能生成诊断精确的报告,这些报告与临床专家标注具有高度一致性。