Large language models have revolutionized the field of NLP by achieving state-of-the-art performance on various tasks. However, there is a concern that these models may disclose information in the training data. In this study, we focus on the summarization task and investigate the membership inference (MI) attack: given a sample and black-box access to a model's API, it is possible to determine if the sample was part of the training data. We exploit text similarity and the model's resistance to document modifications as potential MI signals and evaluate their effectiveness on widely used datasets. Our results demonstrate that summarization models are at risk of exposing data membership, even in cases where the reference summary is not available. Furthermore, we discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.
翻译:大型语言模型通过在各类任务中取得最先进的性能,彻底改变了自然语言处理领域。然而,这些模型可能泄露训练数据中的信息,这一问题引发关注。本研究聚焦于摘要任务,探讨成员推理(membership inference, MI)攻击:给定一个样本及对模型API的黑盒访问权限,能否判断该样本是否属于训练数据。我们利用文本相似性及模型对文档修改的抵抗性作为潜在的MI信号,并在广泛使用的数据集上评估其有效性。结果表明,即使在没有参考摘要的情况下,摘要模型也存在暴露数据成员身份的风险。此外,我们讨论了几种训练摘要模型以抵御MI攻击的防护措施,并探讨了隐私与效用之间的固有权衡。