Machine Learning (ML) algorithms are vulnerable to poisoning attacks, where a fraction of the training data is manipulated to deliberately degrade the algorithms' performance. Optimal attacks can be formulated as bilevel optimization problems and help to assess their robustness in worst-case scenarios. We show that current approaches, which typically assume that hyperparameters remain constant, lead to an overly pessimistic view of the algorithms' robustness and of the impact of regularization. We propose a novel optimal attack formulation that considers the effect of the attack on the hyperparameters and models the attack as a multiobjective bilevel optimization problem. This allows to formulate optimal attacks, learn hyperparameters and evaluate robustness under worst-case conditions. We apply this attack formulation to several ML classifiers using $L_2$ and $L_1$ regularization. Our evaluation on multiple datasets confirms the limitations of previous strategies and evidences the benefits of using $L_2$ and $L_1$ regularization to dampen the effect of poisoning attacks.
翻译:机器学习算法易受投毒攻击,即部分训练数据被恶意操纵以刻意降低算法性能。最优攻击可被建模为双层优化问题,有助于评估算法在极端场景下的鲁棒性。我们表明,当前通常假设超参数保持不变的现有方法,会导致对算法鲁棒性及正则化影响过于悲观的理解。我们提出一种新型最优攻击公式,该公式考虑了攻击对超参数的影响,并将攻击建模为多目标双层优化问题。这使得我们能够构建最优攻击、学习超参数并评估极端条件下的鲁棒性。我们将该攻击公式应用于使用\(L_2\)和\(L_1\)正则化的多种机器学习分类器。在多个数据集上的评估验证了先前策略的局限性,并证明了使用\(L_2\)和\(L_1\)正则化可有效缓解投毒攻击的影响。