The prevalence of artificial intelligence (AI) has envisioned an era of healthcare democratisation that promises every stakeholder a new and better way of life. However, the advancement of clinical AI research is significantly hurdled by the dearth of data democratisation in healthcare. To truly democratise data for AI studies, challenges are two-fold: 1. the sensitive information in clinical data should be anonymised appropriately, and 2. AI-oriented clinical knowledge should flow freely across organisations. This paper considers a recent deep-learning advent, dataset condensation (DC), as a stone that kills two birds in democratising healthcare data. The condensed data after DC, which can be viewed as statistical metadata, abstracts original clinical records and irreversibly conceals sensitive information at individual levels; nevertheless, it still preserves adequate knowledge for learning deep neural networks (DNNs). More favourably, the compressed volumes and the accelerated model learnings of condensed data portray a more efficient clinical knowledge sharing and flowing system, as necessitated by data democratisation. We underline DC's prospects for democratising clinical data, specifically electrical healthcare records (EHRs), for AI research through experimental results and analysis across three healthcare datasets of varying data types.
翻译:人工智能的普及预示着一个医疗健康民主化的时代,承诺为每位利益相关者带来全新且更优质的生活方式。然而,临床人工智能研究的进展受到医疗健康领域数据民主化匮乏的严重阻碍。要实现真正为人工智能研究所用的数据民主化,面临双重挑战:1. 临床数据中的敏感信息应得到妥善匿名处理;2. 面向人工智能的临床知识需在不同机构间自由流通。本文认为,近期深度学习的突破——数据集浓缩(DC)——如同一石二鸟,可推动医疗健康数据的民主化。经过DC处理后的凝练数据可视为统计元数据,它抽象了原始临床记录,并在个体层面不可逆地隐藏了敏感信息;然而,它仍保留了足够的知识用于深度神经网络(DNN)的学习。更有利的是,凝练数据体积压缩且模型学习速度提升,描绘了一个更高效的临床知识共享与流动系统,这正是数据民主化所要求的。通过在三类不同数据类型的医疗健康数据集上的实验结果与分析,我们强调了DC在推动临床数据(特别是电子健康记录EHR)民主化用于人工智能研究方面的潜力。