Addressing data integrity challenges, such as unlearning the effects of data poisoning after model training, is necessary for the reliable deployment of machine learning models. State-of-the-art influence functions, such as EK-FAC and TRAK, often fail to accurately attribute abnormal model behavior to the specific poisoned training data responsible for the data poisoning attack. In addition, traditional unlearning algorithms often struggle to effectively remove the influence of poisoned samples, particularly when only a few affected examples can be identified. To address these challenge, we introduce $\Delta$-Influence, a novel approach that leverages influence functions to trace abnormal model behavior back to the responsible poisoned training data using as little as just one poisoned test example. $\Delta$-Influence applies data transformations that sever the link between poisoned training data and compromised test points without significantly affecting clean data. This allows $\Delta$-Influence to detect large negative shifts in influence scores following data transformations, a phenomenon we term as influence collapse, thereby accurately identifying poisoned training data. Unlearning this subset, e.g. through retraining, effectively eliminates the data poisoning. We validate our method across three vision-based poisoning attacks and three datasets, benchmarking against five detection algorithms and five unlearning strategies. We show that $\Delta$-Influence consistently achieves the best unlearning across all settings, showing the promise of influence functions for corrective unlearning. Our code is publicly available at: https://github.com/Ruby-a07/delta-influence
翻译:解决数据完整性问题(例如在模型训练后遗忘数据毒化的影响)对于机器学习模型的可靠部署至关重要。现有先进的影响函数方法(如EK-FAC和TRAK)往往难以将模型异常行为准确归因于导致数据毒化攻击的具体毒化训练数据。此外,传统遗忘算法通常难以有效消除毒化样本的影响,特别是在仅能识别少量受影响样本的情况下。为应对这些挑战,我们提出$\Delta$-Influence——一种基于影响函数的新方法,该方法仅需单个毒化测试样本即可将模型异常行为溯源至相应的毒化训练数据。$\Delta$-Influence通过数据变换切断毒化训练数据与受损测试点之间的关联,同时最小化对干净数据的影响。该方法可检测数据变换后影响分数的大幅负向偏移(我们称之为影响坍缩),从而精准识别毒化训练数据。通过对此类数据子集进行遗忘操作(例如重训练),可有效消除数据毒化影响。我们在三种基于视觉的毒化攻击和三个数据集上验证了该方法,并与五种检测算法及五种遗忘策略进行对比实验。结果表明,$\Delta$-Influence在所有实验设置中均能实现最优的遗忘效果,证明了影响函数在修正性遗忘任务中的潜力。代码已开源:https://github.com/Ruby-a07/delta-influence