Adoption and Use of LLMs at an Academic Medical Center

Nigam H. Shah,Nerissa Ambers,Abby Pandya,Timothy Keyes,Juan M. Banda,Srikar Nallan,Carlene Lugtu,Artem A. Trotsyuk,Suhana Bedi,Alyssa Unell,Miguel Fuentes,Francois Grolleau,Sneha S. Jain,Jonathan Chen,Devdutta Dash,Danton Char,Aditya Sharma,Duncan McElfresh,Patrick Scully,Vishanthan Kumar,Clancy Dennis,Connor OBrien,Satchi Mouniswamy,Elvis Jones,Krishna Jasti,Gunavathi Mannika Lakshmanan,Sree Ram Akula,Varun Kumar Singh,Ramesh Rajmanickam,Sudhir Sinha,Vicky Zhou,Xu Wang,Bilal Mawji,Joshua Ge,Wencheng Li,Travis Lyons,Jarrod Helzer,Vikas Kakkar,Ramesh Powar,Darren Batara,Cheryl Cordova,William Frederick,Olivia Tang,Phoebe Morgan,April S. Liang,Stephen P. Ma,Shivam Vedak,Dong-han Yao,Akshay Swaminathan,Mehr Kashyap,Brian Ng,Jamie Hellman,Nikesh Kotecha,Christopher Sharp,Gretchen Brown,Christian Lindmark,Anurang Revri,Michael A. Pfeffer

While large language models (LLMs) can support clinical documentation needs, standalone tools struggle with "workflow friction" from manual data entry. We developed ChatEHR, a system that enables the use of LLMs with the entire patient timeline spanning several years. ChatEHR enables automations - which are static combinations of prompts and data that perform a fixed task - and interactive use in the electronic health record (EHR) via a user interface (UI). The resulting ability to sift through patient medical records for diverse use-cases such as pre-visit chart review, screening for transfer eligibility, monitoring for surgical site infections, and chart abstraction, redefines LLM use as an institutional capability. This system, accessible after user-training, enables continuous monitoring and evaluation of LLM use. In 1.5 years, we built 7 automations and 1075 users have trained to become routine users of the UI, engaging in 23,000 sessions in the first 3 months of launch. For automations, being model-agnostic and accessing multiple types of data was essential for matching specific clinical or administrative tasks with the most appropriate LLM. Benchmark-based evaluations proved insufficient for monitoring and evaluation of the UI, requiring new methods to monitor performance. Generation of summaries was the most frequent task in the UI, with an estimated 0.73 hallucinations and 1.60 inaccuracies per generation. The resulting mix of cost savings, time savings, and revenue growth required a value assessment framework to prioritize work as well as quantify the impact of using LLMs. Initial estimates are $6M savings in the first year of use, without quantifying the benefit of the better care offered. Such a "build-from-within" strategy provides an opportunity for health systems to maintain agency via a vendor-agnostic, internally governed LLM platform.

翻译：尽管大语言模型（LLMs）能够支持临床文档需求，但独立工具常因手动数据输入而面临“工作流摩擦”。我们开发了ChatEHR系统，该系统允许使用LLMs处理跨越多年的完整患者时间线。ChatEHR支持自动化功能（即提示词与数据的静态组合以执行固定任务），并通过用户界面（UI）在电子健康记录（EHR）中实现交互式使用。由此产生的对患者医疗记录进行筛选的能力，适用于就诊前病历审查、转院资格筛查、手术部位感染监测及病历摘要提取等多种用例，将LLM的使用重新定义为一种机构能力。该系统在用户培训后即可使用，并支持对LLM使用进行持续监控与评估。在1.5年内，我们构建了7个自动化功能，1075名用户完成培训并成为UI的常规用户，在启动后的前3个月内完成了23000次会话。对于自动化功能，模型无关性及多类型数据访问能力对于将特定临床或行政任务与最合适的LLM相匹配至关重要。基于基准测试的评估方法不足以对UI进行监控与评估，因此需要开发新方法来监控性能。摘要生成是UI中最常见的任务，每次生成平均存在0.73次幻觉和1.60次不准确。由此产生的成本节约、时间节约与收入增长混合效应，需要一套价值评估框架来优先安排工作并量化LLM使用的影响。初步估计首年节约达600万美元，这尚未量化更优医疗服务带来的益处。这种“内生式建设”策略为医疗系统提供了通过供应商无关、内部治理的LLM平台保持自主权的机会。