This paper explores a scenario in which a malicious actor employs a multi-armed attack strategy to manipulate data samples, offering them various avenues to introduce noise into the dataset. Our central objective is to protect the data by detecting any alterations to the input. We approach this defensive strategy with utmost caution, operating in an environment where the defender possesses significantly less information compared to the attacker. Specifically, the defender is unable to utilize any data samples for training a defense model or verifying the integrity of the channel. Instead, the defender relies exclusively on a set of pre-existing detectors readily available ``off the shelf''. To tackle this challenge, we derive an innovative information-theoretic defense approach that optimally aggregates the decisions made by these detectors, eliminating the need for any training data. We further explore a practical use-case scenario for empirical evaluation, where the attacker possesses a pre-trained classifier and launches well-known adversarial attacks against it. Our experiments highlight the effectiveness of our proposed solution, even in scenarios that deviate from the optimal setup.
翻译:本文探讨了一种恶意行为者采用多臂攻击策略操控数据样本的场景,该策略为攻击者提供了向数据集注入噪声的多条途径。我们的核心目标是通过检测输入的任何篡改来保护数据。我们以极度审慎的态度处理这一防御策略,在防御方相较于攻击方拥有显著更少信息的对抗环境中展开工作。具体而言,防御方无法利用任何数据样本来训练防御模型或验证通道的完整性,而只能依赖一组现成的预存检测器。为应对这一挑战,我们推导出一种创新的信息论防御方法,该方法能在无需任何训练数据的情况下,以最优方式整合这些检测器做出的决策。我们进一步探索了一个实用场景进行实证评估:攻击者拥有预训练分类器并对其发动已知的典型对抗攻击。实验结果表明,即使在与最优设定存在偏差的场景中,我们所提出的方案依然展现了显著有效性。