This study applies Large Language Models (LLMs) to two foundational Electronic Health Record (EHR) data science tasks: structured data querying (using programmatic languages, Python/Pandas) and information extraction from unstructured clinical text via a Retrieval Augmented Generation (RAG) pipeline. We test the ability of LLMs to interact accurately with large structured datasets for analytics and the reliability of LLMs in extracting semantically correct information from free text health records when supported by RAG. To this end, we presented a flexible evaluation framework that automatically generates synthetic question and answer pairs tailored to the characteristics of each dataset or task. Experiments were conducted on a curated subset of MIMIC III, (four structured tables and one clinical note type), using a mix of locally hosted and API-based LLMs. Evaluation combined exact-match metrics, semantic similarity, and human judgment. Our findings demonstrate the potential of LLMs to support precise querying and accurate information extraction in clinical workflows.
翻译:本研究将大型语言模型应用于两项基础的电子健康记录数据科学任务:结构化数据查询(使用编程语言Python/Pandas)以及通过检索增强生成管道从非结构化临床文本中提取信息。我们测试了LLMs在分析大型结构化数据集时进行准确交互的能力,以及在RAG支持下从自由文本健康记录中提取语义正确信息的可靠性。为此,我们提出了一个灵活的评估框架,能够根据每个数据集或任务的特征自动生成定制的合成问答对。实验在精选的MIMIC III子集(四个结构化表格和一种临床记录类型)上进行,使用了本地部署和基于API的LLMs混合方案。评估结合了精确匹配指标、语义相似度和人工判断。我们的研究结果证明了LLMs在支持临床工作流程中实现精准查询和准确信息提取方面的潜力。