With the exponential growth of data and its crucial impact on our lives and decision-making, the integrity of data has become a significant concern. Malicious data poisoning attacks, where false values are injected into the data, can disrupt machine learning processes and lead to severe consequences. To mitigate these attacks, distance-based defenses, such as trimming, have been proposed, but they can be easily evaded by white-box attackers. The evasiveness and effectiveness of poisoning attack strategies are two sides of the same coin, making game theory a promising approach. However, existing game-theoretical models often overlook the complexities of online data poisoning attacks, where strategies must adapt to the dynamic process of data collection. In this paper, we present an interactive game-theoretical model to defend online data manipulation attacks using the trimming strategy. Our model accommodates a complete strategy space, making it applicable to strong evasive and colluding adversaries. Leveraging the principle of least action and the Euler-Lagrange equation from theoretical physics, we derive an analytical model for the game-theoretic process. To demonstrate its practical usage, we present a case study in a privacy-preserving data collection system under local differential privacy where a non-deterministic utility function is adopted. Two strategies are devised from this analytical model, namely, Tit-for-tat and Elastic. We conduct extensive experiments on real-world datasets, which showcase the effectiveness and accuracy of these two strategies.
翻译:随着数据的指数级增长及其对生活和决策的关键影响,数据完整性已成为显著关切。恶意数据投毒攻击(向数据中注入虚假值)会破坏机器学习过程并导致严重后果。为缓解此类攻击,研究者提出了基于距离的防御机制(如修剪),但这类方法易被白盒攻击者规避。投毒攻击策略的规避性与有效性如同一枚硬币的两面,这使博弈论成为具有前景的解决方案。然而,现有博弈论模型常忽视在线数据投毒攻击的复杂性——其策略需适应数据采集的动态过程。本文提出一种基于交互式博弈论的防御模型,通过修剪策略应对在线数据操纵攻击。该模型兼容完整策略空间,适用于强规避性和共谋型对手。基于理论物理学中的最小作用量原理与欧拉-拉格朗日方程,我们推导出博弈过程的解析模型。为展示其实用性,在采用非确定性效用函数的本地差分隐私数据采集系统中进行案例研究,并由此解析模型设计了两种策略:以牙还牙策略与弹性策略。在真实世界数据集上的大量实验表明,这两种策略兼具有效性与准确性。