Enhancement attacks in biomedical machine learning

The prevalence of machine learning in biomedical research is rapidly growing, yet the trustworthiness of such research is often overlooked. While some previous works have investigated the ability of adversarial attacks to degrade model performance in medical imaging, the ability to falsely improve performance via recently-developed "enhancement attacks" may be a greater threat to biomedical machine learning. In the spirit of developing attacks to better understand trustworthiness, we developed two techniques to drastically enhance prediction performance of classifiers with minimal changes to features: 1) general enhancement of prediction performance, and 2) enhancement of a particular method over another. Our enhancement framework falsely improved classifiers' accuracy from 50% to almost 100% while maintaining high feature similarities between original and enhanced data (Pearson's r's>0.99). Similarly, the method-specific enhancement framework was effective in falsely improving the performance of one method over another. For example, a simple neural network outperformed logistic regression by 17% on our enhanced dataset, although no performance differences were present in the original dataset. Crucially, the original and enhanced data were still similar (r=0.99). Our results demonstrate the feasibility of minor data manipulations to achieve any desired prediction performance, which presents an interesting ethical challenge for the future of biomedical machine learning. These findings emphasize the need for more robust data provenance tracking and other precautionary measures to ensure the integrity of biomedical machine learning research.

翻译：机器学习在生物医学研究中的应用日益普及，但此类研究的可信度却常常被忽视。尽管先前的一些研究探讨了对抗攻击降低医学影像模型性能的能力，但近期开发的"增强攻击"能够虚假提升模型性能，这可能是对生物医学机器学习更大的威胁。本着开发攻击以更好地理解可信度的精神，我们提出了两种通过对特征进行最小修改即可大幅提升分类器预测性能的技术：1）预测性能的通用增强，以及2）对特定方法相对于其他方法的性能增强。我们的增强框架将分类器的准确率从50%虚假提升至近100%，同时保持原始数据与增强数据之间的高特征相似性（皮尔逊相关系数r>0.99）。类似地，方法特异性增强框架能有效虚假提升某一种方法相对于其他方法的性能。例如，在我们的增强数据集上，简单神经网络比逻辑回归性能高出17%，而原始数据集中并不存在性能差异。关键的是，原始数据与增强数据依然高度相似（r=0.99）。我们的研究结果表明，通过微小数据操作即可实现任意期望的预测性能，这为生物医学机器学习的未来提出了有趣的伦理挑战。这些发现强调了需要更稳健的数据溯源追踪及其他预防措施，以确保生物医学机器学习研究的完整性。