How to Sift Out a Clean Data Subset in the Presence of Data Poisoning?

Given the volume of data needed to train modern machine learning models, external suppliers are increasingly used. However, incorporating external data poses data poisoning risks, wherein attackers manipulate their data to degrade model utility or integrity. Most poisoning defenses presume access to a set of clean data (or base set). While this assumption has been taken for granted, given the fast-growing research on stealthy poisoning attacks, a question arises: can defenders really identify a clean subset within a contaminated dataset to support defenses? This paper starts by examining the impact of poisoned samples on defenses when they are mistakenly mixed into the base set. We analyze five defenses and find that their performance deteriorates dramatically with less than 1% poisoned points in the base set. These findings suggest that sifting out a base set with high precision is key to these defenses' performance. Motivated by these observations, we study how precise existing automated tools and human inspection are at identifying clean data in the presence of data poisoning. Unfortunately, neither effort achieves the precision needed. Worse yet, many of the outcomes are worse than random selection. In addition to uncovering the challenge, we propose a practical countermeasure, Meta-Sift. Our method is based on the insight that existing attacks' poisoned samples shifts from clean data distributions. Hence, training on the clean portion of a dataset and testing on the corrupted portion will result in high prediction loss. Leveraging the insight, we formulate a bilevel optimization to identify clean data and further introduce a suite of techniques to improve efficiency and precision. Our evaluation shows that Meta-Sift can sift a clean base set with 100% precision under a wide range of poisoning attacks. The selected base set is large enough to give rise to successful defenses.

翻译：鉴于现代机器学习模型需要大量数据进行训练，外部数据供应商的使用日益普遍。然而，引入外部数据带来了数据中毒风险——攻击者会篡改数据以降低模型效用或完整性。绝大多数中毒防御方法都假设能够获取到一组干净数据（或称基集）。尽管这一假设长期被视作理所当然，但随着隐蔽性中毒攻击研究的快速发展，一个问题随之浮现：防御者真的能在受污染的数据集中识别出干净子集以支撑防御机制吗？本文首先探讨了当被污染样本被错误混入基集时对防御方法的影响。通过分析五种防御机制，我们发现当基集中存在不到1%的中毒点时，其性能就会急剧恶化。这一发现表明，高精度筛选出基集是保障防御性能的关键。受此启发，我们研究了现有自动化工具和人工审查在数据中毒场景下识别干净数据的精确度。遗憾的是，这两种方法均未能达到所需精度，更糟糕的是，许多结果甚至不如随机选择。在揭示这一挑战的同时，我们提出了一种实用对策Meta-Sift。该方法基于一个核心洞察：现有攻击生成的中毒样本会偏离干净数据分布。因此，使用数据集的干净部分训练，并在被污染部分测试将产生高预测损失。基于此，我们构建了双层优化模型来识别干净数据，并引入一系列提升效率与精度的技术。实验表明，Meta-Sift能在多种中毒攻击下以100%的精确度筛选出干净基集，且筛选出的基集规模足以支撑有效的防御。