MIST: Defending Against Membership Inference Attacks Through Membership-Invariant Subspace Training

In Member Inference (MI) attacks, the adversary try to determine whether an instance is used to train a machine learning (ML) model. MI attacks are a major privacy concern when using private data to train ML models. Most MI attacks in the literature take advantage of the fact that ML models are trained to fit the training data well, and thus have very low loss on training instances. Most defenses against MI attacks therefore try to make the model fit the training data less well. Doing so, however, generally results in lower accuracy. We observe that training instances have different degrees of vulnerability to MI attacks. Most instances will have low loss even when not included in training. For these instances, the model can fit them well without concerns of MI attacks. An effective defense only needs to (possibly implicitly) identify instances that are vulnerable to MI attacks and avoids overfitting them. A major challenge is how to achieve such an effect in an efficient training process. Leveraging two distinct recent advancements in representation learning: counterfactually-invariant representations and subspace learning methods, we introduce a novel Membership-Invariant Subspace Training (MIST) method to defend against MI attacks. MIST avoids overfitting the vulnerable instances without significant impact on other instances. We have conducted extensive experimental studies, comparing MIST with various other state-of-the-art (SOTA) MI defenses against several SOTA MI attacks. We find that MIST outperforms other defenses while resulting in minimal reduction in testing accuracy.

翻译：在成员推理（MI）攻击中，攻击者试图判断某个样本是否被用于训练机器学习（ML）模型。当使用私有数据训练ML模型时，MI攻击构成了重大的隐私威胁。文献中的大多数MI攻击利用了这样一个事实：ML模型被训练以很好地拟合训练数据，因此对训练样本具有极低的损失。因此，大多数针对MI攻击的防御措施试图降低模型对训练数据的拟合程度。然而，这样做通常会导致模型准确率下降。我们观察到，训练样本对MI攻击具有不同程度的脆弱性。大多数样本即使未被包含在训练集中，也会表现出较低的损失。对于这些样本，模型可以很好地拟合它们而无需担心MI攻击。一个有效的防御只需（可能隐式地）识别出易受MI攻击的样本，并避免对其过拟合。主要挑战在于如何在高效的训练过程中实现这种效果。借助表征学习领域两项最新的进展——反事实不变表征与子空间学习方法，我们提出了一种新颖的成员不变子空间训练（MIST）方法来防御MI攻击。MIST能够避免对脆弱样本的过拟合，同时不会对其他样本产生显著影响。我们进行了广泛的实验研究，将MIST与多种其他最先进的MI防御方法在应对若干最先进的MI攻击时进行了比较。实验结果表明，MIST在仅导致测试准确率最小程度下降的同时，其防御效果优于其他方法。