暴露与防御漏洞预测模型中的成员信息泄露 (Exposing and Defending Membership Leakage in Vulnerability Prediction Models)

Neural models for vulnerability prediction (VP) have achieved impressive performance by learning from large-scale code repositories. However, their susceptibility to Membership Inference Attacks (MIAs), where adversaries aim to infer whether a particular code sample was used during training, poses serious privacy concerns. While MIA has been widely investigated in NLP and vision domains, its effects on security-critical code analysis tasks remain underexplored. In this work, we conduct the first comprehensive analysis of MIA on VP models, evaluating the attack success across various architectures (LSTM, BiGRU, and CodeBERT) and feature combinations, including embeddings, logits, loss, and confidence. Our threat model aligns with black-box and gray-box settings where prediction outputs are observable, allowing adversaries to infer membership by analyzing output discrepancies between training and non-training samples. The empirical findings reveal that logits and loss are the most informative and vulnerable outputs for membership leakage. Motivated by these observations, we propose a Noise-based Membership Inference Defense (NMID), which is a lightweight defense module that applies output masking and Gaussian noise injection to disrupt adversarial inference. Extensive experiments demonstrate that NMID significantly reduces MIA effectiveness, lowering the attack AUC from nearly 1.0 to below 0.65, while preserving the predictive utility of VP models. Our study highlights critical privacy risks in code analysis and offers actionable defense strategies for securing AI-powered software systems.

翻译：基于神经网络的漏洞预测模型通过从大规模代码仓库中学习，已取得令人瞩目的性能。然而，这些模型易受成员推断攻击的影响——攻击者旨在推断特定代码样本是否在训练过程中被使用——这引发了严重的隐私担忧。尽管成员推断攻击在自然语言处理和视觉领域已得到广泛研究，但其对安全关键代码分析任务的影响仍未充分探索。在本工作中，我们对漏洞预测模型进行了首次全面的成员推断攻击分析，评估了在不同架构（LSTM、BiGRU 和 CodeBERT）及特征组合（包括嵌入向量、逻辑值、损失值和置信度）下的攻击成功率。我们的威胁模型与黑盒和灰盒场景一致，即预测输出可被观测，允许攻击者通过分析训练样本与非训练样本的输出差异来推断成员信息。实证结果表明，逻辑值和损失值是成员信息泄露最具信息量和最脆弱的输出。基于这些观察，我们提出了一种基于噪声的成员推断防御方法，这是一种轻量级防御模块，通过应用输出掩码和高斯噪声注入来干扰攻击者的推断。大量实验证明，该方法能显著降低成员推断攻击的有效性，将攻击的 AUC 从接近 1.0 降至 0.65 以下，同时保持漏洞预测模型的预测实用性。本研究揭示了代码分析中的关键隐私风险，并为保障基于人工智能的软件系统安全提供了可行的防御策略。