MedDiff: Generating Electronic Health Records using Accelerated Denoising Diffusion Model

Due to patient privacy protection concerns, machine learning research in healthcare has been undeniably slower and limited than in other application domains. High-quality, realistic, synthetic electronic health records (EHRs) can be leveraged to accelerate methodological developments for research purposes while mitigating privacy concerns associated with data sharing. The current state-of-the-art model for synthetic EHR generation is generative adversarial networks, which are notoriously difficult to train and can suffer from mode collapse. Denoising Diffusion Probabilistic Models, a class of generative models inspired by statistical thermodynamics, have recently been shown to generate high-quality synthetic samples in certain domains. It is unknown whether these can generalize to generation of large-scale, high-dimensional EHRs. In this paper, we present a novel generative model based on diffusion models that is the first successful application on electronic health records. Our model proposes a mechanism to perform class-conditional sampling to preserve label information. We also introduce a new sampling strategy to accelerate the inference speed. We empirically show that our model outperforms existing state-of-the-art synthetic EHR generation methods.

翻译：由于患者隐私保护的限制，医疗健康领域的机器学习研究相较于其他应用领域发展明显更慢且更受限。高质量、逼真的合成电子健康记录（EHR）可以在缓解数据共享隐私问题的同时，加速旨在研究用途的方法学开发。目前用于合成EHR生成的最先进模型是生成对抗网络，这类模型训练难度极高且容易遭遇模式崩溃。受统计热力学启发的去噪扩散概率模型（一类生成模型）近期已被证明能在特定领域生成高质量合成样本，但其能否推广至大规模高维EHR生成仍有待验证。本文提出一种基于扩散模型的新型生成模型，这是该架构在电子健康记录上的成功首秀。我们的模型引入了一种类别条件采样机制以保留标签信息，同时提出了一种加速推理过程的新型采样策略。实验证明，该模型在性能上超越了现有最先进的合成EHR生成方法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日