Neighborhood Blending: A Lightweight Inference-Time Defense Against Membership Inference Attacks

In recent years, the widespread adoption of Machine Learning as a Service (MLaaS), particularly in sensitive environments, has raised considerable privacy concerns. Of particular importance are membership inference attacks (MIAs), which exploit behavioral discrepancies between training and non-training data to determine whether a specific record was included in the model's training set, thereby presenting significant privacy risks. Although existing defenses, such as adversarial regularization, DP-SGD, and MemGuard, assist in mitigating these threats, they often entail trade-offs such as compromising utility, increased computational requirements, or inconsistent protection against diverse attack vectors. In this paper, we introduce a novel inference-time defense mechanism called Neighborhood Blending, which mitigates MIAs without retraining the model or incurring significant computational overhead. Our approach operates post-training by smoothing the model's confidence outputs based on the neighborhood of a queried sample. By averaging predictions from similar training samples selected using differentially private sampling, our method establishes a consistent confidence pattern, rendering members and non-members indistinguishable to an adversary while maintaining high utility. Significantly, Neighborhood Blending maintains label integrity (zero label loss) and ensures high utility through an adaptive, "pay-as-you-go" distortion strategy. It is a model-agnostic approach that offers a practical, lightweight solution that enhances privacy without sacrificing model utility. Through extensive experiments across diverse datasets and models, we demonstrate that our defense significantly reduces MIA success rates while preserving model performance, outperforming existing post-hoc defenses like MemGuard and training-time techniques like DP-SGD in terms of utility retention.

翻译：近年来，机器学习即服务（MLaaS）的广泛应用，特别是在敏感环境中，引发了严重的隐私担忧。其中尤为重要的是成员推断攻击（MIAs），它利用训练数据与非训练数据之间的行为差异，判断特定记录是否包含在模型的训练集中，从而构成重大的隐私风险。尽管现有的防御方法（如对抗正则化、DP-SGD和MemGuard）有助于缓解这些威胁，但它们通常需要权衡利弊，例如牺牲模型效用、增加计算需求或对不同攻击向量的保护效果不一致。本文提出了一种新颖的推理时防御机制，称为邻域融合，它能在无需重新训练模型或产生显著计算开销的情况下缓解MIAs。我们的方法在训练后运行，通过基于查询样本邻域平滑模型的置信度输出来实现。通过使用差分隐私采样选取的相似训练样本进行预测平均，我们的方法建立了一致的置信度模式，使得攻击者无法区分成员与非成员，同时保持较高的模型效用。值得注意的是，邻域融合保持了标签完整性（零标签损失），并通过一种自适应的“按需付费”失真策略确保了高效用。这是一种与模型无关的方法，提供了一种实用、轻量级的解决方案，能在不牺牲模型效用的前提下增强隐私保护。通过在多种数据集和模型上进行广泛实验，我们证明该防御方法能显著降低MIA成功率，同时保持模型性能，在效用保留方面优于MemGuard等事后防御方法和DP-SGD等训练时技术。