Uncertainty Quantification for Clinical Outcome Predictions with (Large) Language Models

To facilitate healthcare delivery, language models (LMs) have significant potential for clinical prediction tasks using electronic health records (EHRs). However, in these high-stakes applications, unreliable decisions can result in high costs due to compromised patient safety and ethical concerns, thus increasing the need for good uncertainty modeling of automated clinical predictions. To address this, we consider the uncertainty quantification of LMs for EHR tasks in white- and black-box settings. We first quantify uncertainty in white-box models, where we can access model parameters and output logits. We show that an effective reduction of model uncertainty can be achieved by using the proposed multi-tasking and ensemble methods in EHRs. Continuing with this idea, we extend our approach to black-box settings, including popular proprietary LMs such as GPT-4. We validate our framework using longitudinal clinical data from more than 6,000 patients in ten clinical prediction tasks. Results show that ensembling methods and multi-task prediction prompts reduce uncertainty across different scenarios. These findings increase the transparency of the model in white-box and black-box settings, thus advancing reliable AI healthcare.

翻译：为提升医疗服务效率，语言模型在利用电子健康记录进行临床预测任务方面展现出巨大潜力。然而，在这类高风险应用中，不可靠的决策可能因患者安全受损及伦理问题导致高昂代价，因此亟需对自动化临床预测建立完善的不确定性建模。为此，我们研究了白盒与黑盒场景下语言模型在电子健康记录任务中的不确定性量化问题。首先，我们在可访问模型参数与输出逻辑值的白盒模型中量化不确定性，证明通过提出的多任务与集成方法能有效降低电子健康记录任务中的模型不确定性。基于此思路，我们将方法扩展至黑盒场景（包括GPT-4等主流专有语言模型）。我们使用来自6,000余名患者的纵向临床数据，在十项临床预测任务中验证了该框架。结果表明：集成方法与多任务预测提示能降低不同场景下的不确定性。这些发现增强了白盒与黑盒场景下模型的透明度，从而推动可靠人工智能医疗的发展。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/