Artificial intelligence has made great progress in medical data analysis, but the lack of robustness and trustworthiness has kept these methods from being widely deployed. As it is not possible to train networks that are accurate in all situations, models must recognize situations where they cannot operate confidently. Bayesian deep learning methods sample the model parameter space to estimate uncertainty, but these parameters are often subject to the same vulnerabilities, which can be exploited by adversarial attacks. We propose a novel ensemble approach based on feature decorrelation and Fourier partitioning for teaching networks diverse complementary features, reducing the chance of perturbation-based fooling. We test our approach on electrocardiogram classification, demonstrating superior accuracy confidence measurement, on a variety of adversarial attacks. For example, on our ensemble trained with both decorrelation and Fourier partitioning scored a 50.18% inference accuracy and 48.01% uncertainty accuracy (area under the curve) on {\epsilon} = 50 projected gradient descent attacks, while a conventionally trained ensemble scored 21.1% and 30.31% on these metrics respectively. Our approach does not require expensive optimization with adversarial samples and can be scaled to large problems. These methods can easily be applied to other tasks for more robust and trustworthy models.
翻译:人工智能在医疗数据分析中取得了巨大进展,但鲁棒性和可信度的缺乏限制了这些方法的广泛部署。由于无法训练在所有情况下都准确的网络,模型必须识别出自身无法自信运行的情境。贝叶斯深度学习方法通过采样模型参数空间来估计不确定性,但这些参数通常面临相同的漏洞,可能被对抗攻击利用。我们提出了一种基于特征去相关和傅里叶分区的新型集成方法,用于向网络传授多样化的互补特征,从而降低基于扰动的欺骗概率。我们在心电图分类任务上测试了该方法,在多种对抗攻击下展示了卓越的精度置信度测量。例如,在采用去相关和傅里叶分区训练的集成模型上,面对ε=50的投影梯度下降攻击,推理精度达到50.18%,不确定性精度(曲线下面积)为48.01%;而传统方法训练的集成模型在这两项指标上分别为21.1%和30.31%。我们的方法无需使用对抗样本进行昂贵的优化,并且可扩展至大规模问题。这些方法可轻松应用于其他任务,以构建更鲁棒和可信的模型。