Medical large language models are easily distracted

Large language models (LLMs) have the potential to transform medicine, but real-world clinical scenarios contain extraneous information that can hinder performance. The rise of assistive technologies like ambient dictation, which automatically generates draft notes from live patient encounters, has the potential to introduce additional noise making it crucial to assess the ability of LLM's to filter relevant data. To investigate this, we developed MedDistractQA, a benchmark using USMLE-style questions embedded with simulated real-world distractions. Our findings show that distracting statements (polysemous words with clinical meanings used in a non-clinical context or references to unrelated health conditions) can reduce LLM accuracy by up to 17.9%. Commonly proposed solutions to improve model performance such as retrieval-augmented generation (RAG) and medical fine-tuning did not change this effect and in some cases introduced their own confounders and further degraded performance. Our findings suggest that LLMs natively lack the logical mechanisms necessary to distinguish relevant from irrelevant clinical information, posing challenges for real-world applications. MedDistractQA and our results highlights the need for robust mitigation strategies to enhance LLM resilience to extraneous information.

翻译：大语言模型（LLMs）具有变革医学的潜力，但现实临床场景中包含可能影响性能的无关信息。诸如环境听写这类辅助技术的兴起——能够从实时患者接诊中自动生成草稿记录——可能引入额外的噪声，这使得评估LLMs过滤相关数据的能力变得至关重要。为探究此问题，我们开发了MedDistractQA基准测试，该测试采用嵌入模拟现实世界干扰的USMLE风格问题。我们的研究结果表明，干扰性陈述（具有临床含义的多义词在非临床语境中使用，或提及无关健康状况）可使LLM准确率降低高达17.9%。通常提出的提升模型性能的解决方案，如检索增强生成（RAG）和医学微调，并未改变此效应，在某些情况下甚至引入了自身的混淆因素并进一步降低了性能。我们的发现表明，LLMs本身缺乏区分临床相关信息与无关信息所需的逻辑机制，这为其实际应用带来了挑战。MedDistractQA及我们的结果凸显了需要制定稳健的缓解策略以增强LLMs对无关信息的抵御能力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日