The recent success of machine learning (ML) has been fueled by the increasing availability of computing power and large amounts of data in many different applications. However, the trustworthiness of the resulting models can be compromised when such data is maliciously manipulated to mislead the learning process. In this article, we first review poisoning attacks that compromise the training data used to learn ML models, including attacks that aim to reduce the overall performance, manipulate the predictions on specific test samples, and even implant backdoors in the model. We then discuss how to mitigate these attacks using basic security principles, or by deploying ML-oriented defensive mechanisms. We conclude our article by formulating some relevant open challenges which are hindering the development of testing methods and benchmarks suitable for assessing and improving the trustworthiness of ML models against data poisoning attacks
翻译:近年来,机器学习(ML)的成功得益于众多应用场景中计算能力的持续提升与海量数据的日益可用。然而,当恶意操控数据误导学习过程时,所生成模型的可信度将面临严重威胁。本文首先系统梳理了损害ML模型训练数据的投毒攻击手段,包括旨在降低模型整体性能、操纵特定测试样本预测结果乃至在模型中植入后门的各类攻击。继而探讨通过基础安全原则或部署面向ML的防御机制来缓解此类攻击的策略。最后,本文提出若干阻碍评估与提升ML模型抗数据投毒攻击可信度的测试方法与基准开发的关键开放性挑战。