Machine learning is susceptible to poisoning attacks, in which an attacker controls a small fraction of the training data and chooses that data with the goal of inducing some behavior unintended by the model developer in the trained model. We consider a realistic setting in which the adversary with the ability to insert a limited number of data points attempts to control the model's behavior on a specific subpopulation. Inspired by previous observations on disparate effectiveness of random label-flipping attacks on different subpopulations, we investigate the properties that can impact the effectiveness of state-of-the-art poisoning attacks against different subpopulations. For a family of 2-dimensional synthetic datasets, we empirically find that dataset separability plays a dominant role in subpopulation vulnerability for less separable datasets. However, well-separated datasets exhibit more dependence on individual subpopulation properties. We further discover that a crucial subpopulation property is captured by the difference in loss on the clean dataset between the clean model and a target model that misclassifies the subpopulation, and a subpopulation is much easier to attack if the loss difference is small. This property also generalizes to high-dimensional benchmark datasets. For the Adult benchmark dataset, we show that we can find semantically-meaningful subpopulation properties that are related to the susceptibilities of a selected group of subpopulations. The results in this paper are accompanied by a fully interactive web-based visualization of subpopulation poisoning attacks found at https://uvasrg.github.io/visualizing-poisoning
翻译:机器学习易受投毒攻击的影响,在此类攻击中,攻击者控制少量训练数据,并选择这些数据以诱导训练后的模型产生模型开发者未曾预期的某些行为。我们考虑一个现实场景:对手能够插入有限数量的数据点,并试图控制模型对特定子群体的行为。受之前关于随机标签翻转攻击对不同子群体产生差异性有效性的观察启发,我们研究了可能影响最先进投毒攻击针对不同子群体有效性的特性。对于一类二维合成数据集,我们通过实验发现,对于可分离性较低的数据集,数据集的可分离性在子群体脆弱性中起主导作用。然而,对于良好分离的数据集,其脆弱性更依赖于子群体的个体特性。我们进一步发现,一个关键的子群体特性由干净模型与错误分类该子群体的目标模型在干净数据上的损失差异所体现,且当损失差异较小时,该子群体更易遭受攻击。这一特性也适用于高维基准数据集。针对Adult基准数据集,我们证明可以找到与选定子群体易感性相关的具有语义意义的子群体特性。本文结果配有一个完全交互式的基于网络的子群体投毒攻击可视化工具,访问地址为:https://uvasrg.github.io/visualizing-poisoning