For clinical data integration and healthcare services, the HL7 FHIR standard has established itself as a desirable format for interoperability between complex health data. Previous attempts at automating the translation from free-form clinical notes into structured FHIR resources address narrowly defined tasks and rely on modular approaches or LLMs with instruction tuning and constrained decoding. As those solutions frequently suffer from limited generalizability and structural inconformity, we propose an end-to-end framework powered by LLM agents, code execution, and healthcare terminology database tools to address these issues. Our solution, called Infherno, is designed to adhere to the FHIR document schema and competes well with a human baseline in predicting FHIR resources from unstructured text. The implementation features a front end for custom and synthetic data and both local and proprietary models, supporting clinical data integration processes and interoperability across institutions. Gemini 2.5-Pro excels in our evaluation on synthetic and clinical datasets, yet ambiguity and feasibility of collecting ground-truth data remain open problems.
翻译:对于临床数据整合与医疗服务而言,HL7 FHIR标准已成为复杂健康数据互操作性的理想格式。先前自动化将自由文本临床笔记转换为结构化FHIR资源的尝试,多局限于狭义定义的任务,并依赖模块化方法或结合指令微调与约束解码的大语言模型。由于这些方案常存在泛化能力有限及结构不一致的问题,我们提出了一种由LLM智能体、代码执行及医学术语数据库工具驱动的端到端框架以解决上述挑战。所提出的方案Infherno严格遵循FHIR文档架构,在从非结构化文本预测FHIR资源方面与人类基准表现相当。其实现包含支持自定义与合成数据的前端界面,兼容本地与专有模型,从而助力跨机构的临床数据整合流程及互操作性。Gemini 2.5-Pro在合成及临床数据集评估中表现优异,但真实数据收集的模糊性与可行性仍是待解问题。