EyeAgent: An Agentic AI System for Multimodal Clinical Decision Support in Ophthalmology

Danli Shi,Xiaolan Chen,Bingjie Yan,Weiyi Zhang,Pusheng Xu,Jiancheng Yang,Ruoyu Chen,Siyu Huang,Bowen Liu,Xinyuan Wu,Meng Xie,Ziyu Gao,Yue Wu,Senlin Lin,Kai Jin,Xia Gong,Yih Chung Tham,Xiujuan Zhang,Li Dong,Yuzhou Zhang,Jason Yam,Guangming Jin,Xiaohu Ding,Haidong Zou,Yalin Zheng,Zongyuan Ge,Mingguang He

from arxiv, 28 pages, 5 figures

Artificial intelligence has shown promise in medical imaging, yet most existing systems lack flexibility, interpretability, and adaptability - challenges especially pronounced in ophthalmology, where diverse imaging modalities are essential. We present EyeAgent, the first agentic AI framework for comprehensive and interpretable clinical decision support in ophthalmology. Using a large language model (DeepSeek-V3) as its central reasoning engine, EyeAgent interprets user queries and dynamically orchestrates 53 validated ophthalmic tools across 23 imaging modalities for diverse tasks including classification, segmentation, detection, image/report generation, and quantitative analysis. Stepwise ablation analysis demonstrated a progressive improvement in diagnostic accuracy, rising from a baseline of 69.71% (using only 5 general tools) to 80.79% when the full suite of 53 specialized tools was integrated. In an expert rating study on 200 real-world clinical cases, EyeAgent achieved 93.7% tool selection accuracy and received expert ratings of more than 88% across accuracy, completeness, safety, reasoning, and interpretability. In human-AI collaboration, EyeAgent matched or exceeded the performance of senior ophthalmologists and, when used as an assistant, improved overall diagnostic accuracy by 18.51% and report quality scores by 19%, with the greatest benefit observed among junior ophthalmologists. These findings establish EyeAgent as a scalable and trustworthy AI framework for ophthalmology and provide a blueprint for modular, multimodal, and clinically aligned next-generation AI systems.

翻译：人工智能在医学影像领域展现出巨大潜力，但现有系统大多缺乏灵活性、可解释性和适应性——这些挑战在依赖多种成像模态的眼科领域尤为突出。本文提出EyeAgent，首个用于眼科全面且可解释的临床决策支持的智能体AI框架。该系统以大型语言模型（DeepSeek-V3）为核心推理引擎，能够解析用户查询，并动态协调涵盖23种成像模态的53个经过验证的眼科工具，以执行分类、分割、检测、图像/报告生成及定量分析等多样化任务。逐步消融分析显示，诊断准确率从仅使用5个通用工具的基线水平69.71%逐步提升至整合全部53个专用工具后的80.79%。在一项针对200个真实临床病例的专家评分研究中，EyeAgent实现了93.7%的工具选择准确率，并在准确性、完整性、安全性、推理能力和可解释性五个维度上获得超过88%的专家评分。在人机协作场景中，EyeAgent达到或超越了资深眼科医生的诊断水平；当作为辅助工具使用时，其将整体诊断准确率提升了18.51%，报告质量评分提高了19%，其中初级眼科医生获益最为显著。这些发现确立了EyeAgent作为一个可扩展且可信赖的眼科AI框架，并为构建模块化、多模态且与临床需求紧密结合的新一代AI系统提供了蓝图。