Large Language Models (LLMs) are prone to memorizing training data, which poses serious privacy risks. Two of the most prominent concerns are training data extraction and Membership Inference Attacks (MIAs). Prior research has shown that these threats are interconnected: adversaries can extract training data from an LLM by querying the model to generate a large volume of text and subsequently applying MIAs to verify whether a particular data point was included in the training set. In this study, we integrate multiple MIA techniques into the data extraction pipeline to systematically benchmark their effectiveness. We then compare their performance in this integrated setting against results from conventional MIA benchmarks, allowing us to evaluate their practical utility in real-world extraction scenarios.
翻译:大型语言模型(LLMs)容易记忆训练数据,这带来了严重的隐私风险。其中最突出的两个问题是训练数据提取和成员推理攻击(MIAs)。先前研究表明,这两种威胁相互关联:攻击者可通过查询模型生成大量文本,进而应用MIAs来验证特定数据点是否包含在训练集中,从而从LLM中提取训练数据。本研究将多种MIA技术整合到数据提取流程中,系统性地评估其有效性。随后,我们将这种整合场景下的性能表现与传统MIA基准测试结果进行对比,从而评估这些技术在实际提取场景中的实用价值。