MIST: Defending Against Membership Inference Attacks Through Membership-Invariant Subspace Training

In Member Inference (MI) attacks, the adversary try to determine whether an instance is used to train a machine learning (ML) model. MI attacks are a major privacy concern when using private data to train ML models. Most MI attacks in the literature take advantage of the fact that ML models are trained to fit the training data well, and thus have very low loss on training instances. Most defenses against MI attacks therefore try to make the model fit the training data less well. Doing so, however, generally results in lower accuracy. We observe that training instances have different degrees of vulnerability to MI attacks. Most instances will have low loss even when not included in training. For these instances, the model can fit them well without concerns of MI attacks. An effective defense only needs to (possibly implicitly) identify instances that are vulnerable to MI attacks and avoids overfitting them. A major challenge is how to achieve such an effect in an efficient training process. Leveraging two distinct recent advancements in representation learning: counterfactually-invariant representations and subspace learning methods, we introduce a novel Membership-Invariant Subspace Training (MIST) method to defend against MI attacks. MIST avoids overfitting the vulnerable instances without significant impact on other instances. We have conducted extensive experimental studies, comparing MIST with various other state-of-the-art (SOTA) MI defenses against several SOTA MI attacks. We find that MIST outperforms other defenses while resulting in minimal reduction in testing accuracy.

翻译：摘要：在成员推断攻击中，攻击者试图确定某个样本是否被用于训练机器学习模型。成员推断攻击是使用私人数据训练机器学习模型时面临的主要隐私威胁。现有文献中的大多数成员推断攻击利用了机器学习模型为充分拟合训练数据而设计的特性，因此对训练样本的损失值极低。此前的多数防御方法试图降低模型对训练数据的拟合程度，但此举通常导致模型精度下降。我们观察到训练样本对成员推断攻击的脆弱性存在差异：多数样本即便未被纳入训练集也会呈现较低损失值，因此模型对这些样本的充分拟合不会引发成员推断攻击风险。有效的防御仅需（显式或隐式地）识别易受成员推断攻击的样本，并避免对其过拟合。如何通过高效训练过程实现这一目标成为关键挑战。结合表征学习中反事实不变表征与子空间学习方法的突破性进展，我们提出一种新型成员不变子空间训练方法以防御成员推断攻击。该方法在避免对易受攻击样本过拟合的同时，对其他样本的影响微乎其微。通过广泛实验研究，我们将MIST与多类现有最优成员推断防御方法进行对比，并针对多种最优成员推断攻击进行测试。结果表明，MIST在保持测试精度损失最小化的前提下，显著优于其他防御方案。