Membership inference attacks (MIAs) aim to determine whether a specific example was used to train a given language model. While prior work has explored prompt-based attacks such as ReCALL, these methods rely heavily on the assumption that using known non-members as prompts reliably suppresses the model's responses to non-member queries. We propose EM-MIA, a new membership inference approach that iteratively refines prefix effectiveness and membership scores using an expectation-maximization strategy without requiring labeled non-member examples. To support controlled evaluation, we introduce OLMoMIA, a benchmark that enables analysis of MIA robustness under systematically varied distributional overlap and difficulty. Experiments on WikiMIA and OLMoMIA show that EM-MIA outperforms existing baselines, particularly in settings with clear distributional separability. We highlight scenarios where EM-MIA succeeds in practical settings with partial distributional overlap, while failure cases expose fundamental limitations of current MIA methods under near-identical conditions. We release our code and evaluation pipeline to encourage reproducible and robust MIA research.
翻译:成员推理攻击旨在判断特定样本是否被用于训练给定语言模型。尽管已有研究探索了基于提示的攻击方法(如ReCALL),但这些方法严重依赖于"使用已知非成员样本作为提示能可靠抑制模型对非成员查询的响应"这一假设。本文提出EM-MIA——一种无需标注非成员样本、通过期望最大化策略迭代优化前缀有效性与成员评分的新型成员推理方法。为支持受控评估,我们构建了OLMoMIA基准测试集,该系统支持在分布重叠度与难度受控变量下分析成员推理攻击的鲁棒性。在WikiMIA和OLMoMIA上的实验表明,EM-MIA在分布可分性明确的场景中显著优于现有基线方法。我们重点展示了EM-MIA在部分分布重叠的实际场景中的成功案例,同时通过失败案例揭示了当前成员推理方法在近同分布条件下的根本局限。我们公开了代码与评估流程,以促进可复现且鲁棒的成员推理研究。