Accurate medical coding requires consulting authoritative resources such as the ICD tabular list and coding guidelines. Existing LLM-based automated methods largely rely on LLMs' internal knowledge, which is prone to hallucination and cannot keep pace with guideline updates. We introduce RAG-Coding, an agentic, training-free method that augments LLMs with structured external knowledge: the tabular list is encoded as a knowledge graph capturing hierarchical and instructional code relationships, and the guidelines are distilled into concise, code-specific summaries rather than retrieved as raw text. To enable our study, we also introduce MDACE-2025, expert re-annotations of the MDACE dataset under the 2025 ICD-10-CM/PCS guidelines, adding code sequencing and justification comments. On MDACE, RAG-Coding outperforms the best LLM-based baseline by 3--13\% in micro-F1 across five LLM backbones, and achieves comparable micro- and macro-F1 to the supervised state-of-the-art, with higher recall ($+$11\%) at the cost of precision ($-$6\%). On MDACE-2025, RAG-Coding outperforms all baselines, demonstrating effective generalisation to updated guidelines. Ablations confirm stepwise gains, highlighting the importance of integrating structured external knowledge for LLM-based medical coding.
翻译:准确的医学编码需要查阅权威资源,如ICD表格列表和编码指南。现有基于大语言模型(LLM)的自动化方法主要依赖LLM的内部知识,这不仅容易产生幻觉,且难以跟上指南更新。我们提出RAG-Coding——一种无需训练的智能代理方法,通过结构化外部知识增强LLM:将表格列表编码为知识图谱,捕获层级性和指导性的编码关系;同时将编码指南提炼为简洁、面向具体编码的摘要,而非检索原始文本。为支持研究,我们还引入MDACE-2025数据集——该数据集基于2025年ICD-10-CM/PCS指南对MDACE进行专家重新标注,新增编码顺序和理由注释。在MDACE上,RAG-Coding在五种LLM主干下的微F1值超过最佳LLM基线3%至13%,在微观与宏观F1指标上达到与监督式最先进方法相当的水平,并以精度下降6%为代价实现召回率提升11%。在MDACE-2025上,RAG-Coding超越所有基线,展现出对更新指南的有效泛化能力。消融实验验证了渐进式增益,强调了整合结构化外部知识对基于LLM的医学编码的重要性。