Membership Inference Attack (MIA) aims to determine whether a specific data sample was included in the training dataset of a target model. Traditional MIA approaches rely on shadow models to mimic target model behavior, but their effectiveness diminishes for Large Language Model (LLM)-based recommendation systems due to the scale and complexity of training data. This paper introduces a novel knowledge distillation-based MIA paradigm tailored for LLM-based recommendation systems. Our method constructs a reference model via distillation, applying distinct strategies for member and non-member data to enhance discriminative capabilities. The paradigm extracts fused features (e.g., confidence, entropy, loss, and hidden layer vectors) from the reference model to train an attack model, overcoming limitations of individual features. Extensive experiments on extended datasets (Last.FM, MovieLens, Book-Crossing, Delicious) and diverse LLMs (T5, GPT-2, LLaMA3) demonstrate that our approach significantly outperforms shadow model-based MIAs and individual-feature baselines. The results show its practicality for privacy attacks in LLM-driven recommender systems.
翻译:成员推理攻击旨在判断特定数据样本是否属于目标模型的训练数据集。传统的成员推理攻击方法依赖影子模型来模仿目标模型的行为,但由于大语言模型推荐系统中训练数据的规模与复杂性,这类方法的效果显著下降。本文针对基于大语言模型的推荐系统,提出了一种新颖的基于知识蒸馏的成员推理攻击范式。该方法通过蒸馏构建参考模型,对成员数据与非成员数据采用差异化策略,以增强其判别能力。该范式从参考模型中提取融合特征(如置信度、熵、损失及隐藏层向量)来训练攻击模型,克服了单一特征方法的局限性。在扩展数据集(Last.FM、MovieLens、Book-Crossing、Delicious)和多种大语言模型(T5、GPT-2、LLaMA3)上的广泛实验表明,我们的方法显著优于基于影子模型的成员推理攻击及基于单一特征的基线方法。结果验证了该方法在大语言模型驱动的推荐系统中进行隐私攻击的实用性。