Predicting postoperative risks using large language models

from arxiv, Supplemental file available at: https://sites.wustl.edu/alba/files/2024/04/supplemental_materials-283eb0c14629614c.pdf models publicly available at: https://huggingface.co/cja5553/BJH-perioperative-notes-bioGPT AND https://huggingface.co/cja5553/BJH-perioperative-notes-bioGPT

Predicting postoperative risk can inform effective care management & planning. We explored large language models (LLMs) in predicting postoperative risk through clinical texts using various tuning strategies. Records spanning 84,875 patients from Barnes Jewish Hospital (BJH) between 2018 & 2021, with a mean duration of follow-up based on the length of postoperative ICU stay less than 7 days, were utilized. Methods were replicated on the MIMIC-III dataset. Outcomes included 30-day mortality, pulmonary embolism (PE) & pneumonia. Three domain adaptation & finetuning strategies were implemented for three LLMs (BioGPT, ClinicalBERT & BioClinicalBERT): self-supervised objectives; incorporating labels with semi-supervised fine-tuning; & foundational modelling through multi-task learning. Model performance was compared using the AUROC & AUPRC for classification tasks & MSE & R2 for regression tasks. Cohort had a mean age of 56.9 (sd: 16.8) years; 50.3% male; 74% White. Pre-trained LLMs outperformed traditional word embeddings, with absolute maximal gains of 38.3% for AUROC & 14% for AUPRC. Adapting models through self-supervised finetuning further improved performance by 3.2% for AUROC & 1.5% for AUPRC Incorporating labels into the finetuning procedure further boosted performances, with semi-supervised finetuning improving by 1.8% for AUROC & 2% for AUPRC & foundational modelling improving by 3.6% for AUROC & 2.6% for AUPRC compared to self-supervised finetuning. Pre-trained clinical LLMs offer opportunities for postoperative risk predictions with unseen data, & further improvements from finetuning suggests benefits in adapting pre-trained models to note-specific perioperative use cases. Incorporating labels can further boost performance. The superior performance of foundational models suggests the potential of task-agnostic learning towards the generalizable LLMs in perioperative care.

翻译：预测术后风险可为有效的护理管理与规划提供信息。我们探索了通过不同调优策略利用临床文本的大语言模型（LLMs）进行术后风险预测。研究使用了2018年至2021年间来自巴恩斯-犹太医院（BJH）的84,875名患者的记录，平均随访时长基于术后重症监护病房（ICU）住院时间小于7天。方法在MIMIC-III数据集上进行了复现。结局指标包括30天死亡率、肺栓塞（PE）及肺炎。针对三种LLMs（BioGPT、ClinicalBERT和BioClinicalBERT）实施了三种领域自适应和微调策略：自监督目标、结合标签的半监督微调，以及通过多任务学习的基础模型构建。模型性能通过分类任务的AUROC与AUPRC以及回归任务的MSE与R²进行比较。队列平均年龄为56.9岁（标准差：16.8）；男性占50.3%；白人占74%。预训练的LLMs优于传统词嵌入，AUROC和AUPRC的绝对最大提升分别达38.3%和14%。通过自监督微调自适应模型使AUROC和AUPRC进一步提升了3.2%和1.5%。将标签纳入微调过程进一步提升了性能：与自监督微调相比，半监督微调使AUROC提升1.8%、AUPRC提升2%，基础模型构建使AUROC提升3.6%、AUPRC提升2.6%。预训练的临床LLMs为基于未见数据的术后风险预测提供了机会，而微调的进一步改进表明自适应预训练模型至特定围手术期使用场景的益处。纳入标签能进一步提升性能。基础模型的优越性能表明，任务无关学习在围手术期护理中向通用化LLMs发展的潜力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

最新《Transformers模型》教程，64页ppt

专知会员服务

326+阅读 · 2020年11月26日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日