Artificial intelligence has made great progress in medical data analysis, but the lack of robustness and trustworthiness has kept these methods from being widely deployed. As it is not possible to train networks that are accurate in all scenarios, models must recognize situations where they cannot operate confidently. Bayesian deep learning methods sample the model parameter space to estimate uncertainty, but these parameters are often subject to the same vulnerabilities, which can be exploited by adversarial attacks. We propose a novel ensemble approach based on feature decorrelation and Fourier partitioning for teaching networks diverse complementary features, reducing the chance of perturbation-based fooling. We test our approach on single and multi-channel electrocardiogram classification, and adapt adversarial training and DVERGE into the Bayesian ensemble framework for comparison. Our results indicate that the combination of decorrelation and Fourier partitioning generally maintains performance on unperturbed data while demonstrating superior robustness and uncertainty estimation on projected gradient descent and smooth adversarial attacks of various magnitudes. Furthermore, our approach does not require expensive optimization with adversarial samples, adding much less compute to the training process than adversarial training or DVERGE. These methods can be applied to other tasks for more robust and trustworthy models.
翻译:人工智能在医疗数据分析领域取得了显著进展,但缺乏鲁棒性和可信度阻碍了这些方法的广泛部署。由于无法训练在所有场景下都精确的网络,模型必须能够识别其无法自信运行的情况。贝叶斯深度学习方法通过对模型参数空间进行采样来估计不确定性,但这些参数往往存在相同的脆弱性,可能被对抗攻击利用。我们提出了一种基于特征去相关和傅里叶分解的新型集成方法,用于向网络学习多样化的互补特征,从而降低基于扰动的欺骗概率。我们在单通道和多通道心电图分类任务上测试了该方法,并将对抗训练及DVERGE方法适配到贝叶斯集成框架中进行比较。结果表明,去相关与傅里叶分解的组合通常能在未受干扰数据上保持性能,同时针对不同强度的投影梯度下降与平滑对抗攻击展现出卓越的鲁棒性和不确定性估计能力。此外,该方法无需使用对抗样本进行昂贵的优化,相比对抗训练或DVERGE,其训练过程中增加的算力需求显著降低。这些方法可推广至其他任务,以构建更鲁棒且更可信的模型。