When an adversary provides poison samples to a machine learning model, privacy leakage, such as membership inference attacks that infer whether a sample was included in the training of the model, becomes effective by moving the sample to an outlier. However, the attacks can be detected because inference accuracy deteriorates due to poison samples. In this paper, we discuss a \textit{backdoor-assisted membership inference attack}, a novel membership inference attack based on backdoors that return the adversary's expected output for a triggered sample. We found three crucial insights through experiments with an academic benchmark dataset. We first demonstrate that the backdoor-assisted membership inference attack is unsuccessful. Second, when we analyzed loss distributions to understand the reason for the unsuccessful results, we found that backdoors cannot separate loss distributions of training and non-training samples. In other words, backdoors cannot affect the distribution of clean samples. Third, we also show that poison and triggered samples activate neurons of different distributions. Specifically, backdoors make any clean sample an inlier, contrary to poisoning samples. As a result, we confirm that backdoors cannot assist membership inference.
翻译:当攻击者向机器学习模型提供毒化样本时,隐私泄露(例如通过将样本变为离群点来推断该样本是否属于模型训练集的成员推断攻击)会变得有效。然而,由于毒化样本导致推断精度下降,这类攻击可能被检测到。本文讨论了一种基于后门的新型成员推断攻击——后门辅助成员推断攻击,其中后门会对带有触发器的样本返回攻击者预期的输出。通过在学术基准数据集上的实验,我们发现了三个关键见解。首先,我们证明后门辅助成员推断攻击并不成功。其次,在分析损失分布以理解失败原因时,我们发现后门无法分离训练样本与非训练样本的损失分布。换言之,后门无法影响干净样本的分布。第三,我们还表明毒化样本和触发器样本会激活不同分布的神经元。具体而言,与毒化样本相反,后门会使任何干净样本成为内点。因此,我们确认后门无法辅助成员推断。