HealthNLP_Retrievers at ArchEHR-QA 2026: Cascaded LLM Pipeline for Grounded Clinical Question Answering

Patient portals now give individuals direct access to their electronic health records (EHRs), yet access alone does not ensure patients understand or act on the complex clinical information contained in these records. The ArchEHR-QA 2026 shared task addresses this challenge by focusing on grounded question answering over EHRs, and this paper presents the system developed by the HealthNLP_Retrievers team for this task. The proposed approach uses a multi-stage cascaded pipeline powered by the Gemini 2.5 Pro large language model to interpret patient-authored questions and retrieve relevant evidence from lengthy clinical notes. Our architecture comprises four integrated modules: (1) a few-shot query reformulation unit which summarizes verbose patient queries; (2) a heuristic-based evidence scorer which ranks clinical sentences to prioritize recall; (3) a grounded response generator which synthesizes professional-caliber answers restricted strictly to identified evidence; and (4) a high-precision many-to-many alignment framework which links generated answers to supporting clinical sentences. This cascaded approach achieved competitive results. Across the individual tracks, the system ranked 1st in question interpretation, 5th in answer generation, 7th in evidence identification, and 9th in answer-evidence alignment. These results show that integrating large language models within a structured multi-stage pipeline improves grounding, precision, and the professional quality of patient-oriented health communication. To support reproducibility, our source code is publicly available in our GitHub repository

翻译：摘要：患者门户网站现使个人能够直接访问其电子健康记录（EHRs），但仅有访问权限并不能确保患者理解或依据其中复杂的临床信息采取行动。ArchEHR-QA 2026 共享任务通过聚焦于基于证据的EHR问答来应对这一挑战，本文介绍了HealthNLP_Retrievers团队为此任务开发的系统。所提出的方法采用由Gemini 2.5 Pro大型语言模型驱动的多阶段级联流水线，用于解读患者提出的问题并从冗长的临床笔记中检索相关证据。我们的架构包含四个集成模块：（1）少样本查询重组单元，用于总结冗长的患者查询；（2）基于启发式的证据评分器，对临床句子进行排序以优先确保召回率；（3）基于证据的响应生成器，综合生成仅严格限于已识别证据的专业级回答；（4）高精度多对多对齐框架，将生成的回答与支持性临床句子相链接。这种级联方法取得了具有竞争力的结果。在各独立赛道中，该系统在问题解读中排名第一，在答案生成中排名第五，在证据识别中排名第七，在答案-证据对齐中排名第九。这些结果表明，将大语言模型整合到结构化的多阶段流水线中，可提升以患者为导向的健康沟通的基础性、精准度及专业质量。为支持可复现性，我们的源代码已在GitHub仓库中公开。