Self-Supervised Time-to-Event Modeling with Structured Medical Records

Time-to-event (TTE) models are used in medicine and other fields for estimating the probability distribution of the time until a specific event occurs. TTE models provide many advantages over classification using fixed time horizons, including naturally handling censored observations, but require more parameters and are challenging to train in settings with limited labeled data. Existing approaches, e.g. proportional hazards or accelerated failure time, employ distributional assumptions to reduce parameters but are vulnerable to model misspecification. In this work, we address these challenges with MOTOR (Many Outcome Time Oriented Representations), a self-supervised model that leverages temporal structure found in collections of timestamped events in electronic health records (EHR) and health insurance claims. MOTOR uses a TTE pretraining objective that predicts the probability distribution of times when events occur, making it well-suited to transfer learning for medical prediction tasks. Having pretrained on EHR and claims data of up to 55M patient records (9B clinical events), we evaluate performance after finetuning for 19 tasks across two datasets. Task-specific models built using MOTOR improve time-dependent C statistics by 4.6% over state-of-the-art while greatly improving sample efficiency, achieving comparable performance to existing methods using only 5% of available task data.

翻译：时间-事件（TTE）模型用于医学及其他领域，旨在估计特定事件发生时间的概率分布。相较于固定时间窗口的分类方法，TTE模型具有显著优势，例如自然处理删失观测值，但其参数需求更大，且在标注数据有限的场景中训练颇具挑战。现有方法（如比例风险模型或加速失效时间模型）通过引入分布假设来减少参数，却易受模型设定错误的影响。针对上述问题，本研究提出MOTOR（多结局时间导向表征）——一种利用电子健康记录（EHR）及健康保险索赔数据中时间戳事件集合的时序结构的自监督模型。MOTOR采用基于TTE的预训练目标，预测事件发生时间的概率分布，因而特别适用于医学预测任务的迁移学习。基于包含至多5500万患者记录（90亿临床事件）的EHR与索赔数据完成预训练后，我们在两个数据集的19项任务中评估了微调后的性能。基于MOTOR构建的任务特定模型将时间依赖C统计量提升4.6%（超越现有最优方法），同时大幅提高样本效率，仅使用5%的任务数据即可达到与传统方法相当的性能。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日