Membership Inference Attacks (MIAs) act as a crucial auditing tool for the opaque training data of Large Language Models (LLMs). However, existing techniques predominantly rely on inaccessible model internals (e.g., logits) or suffer from poor generalization across domains in strict black-box settings where only generated text is available. In this work, we propose SimMIA, a robust MIA framework tailored for this text-only regime by leveraging an advanced sampling strategy and scoring mechanism. Furthermore, we present WikiMIA-25, a new benchmark curated to evaluate MIA performance on modern proprietary LLMs. Experiments demonstrate that SimMIA achieves state-of-the-art results in the black-box setting, rivaling baselines that exploit internal model information.
翻译:成员推断攻击(MIAs)作为大型语言模型(LLMs)不透明训练数据的关键审计工具,具有重要作用。然而,现有技术主要依赖于难以获取的模型内部信息(如logits),或在严格的黑盒设置(仅生成文本可用)中跨领域泛化能力较差。本工作针对这种纯文本场景,提出了一种稳健的MIA框架SimMIA,该框架通过利用先进的采样策略和评分机制实现。此外,我们构建了WikiMIA-25这一新基准,用于评估现代专有LLMs上的MIA性能。实验表明,SimMIA在黑盒设置中取得了最先进的结果,其性能可与利用模型内部信息的基线方法相媲美。