No Caption, No Problem: Caption-Free Membership Inference via Model-Fitted Embeddings

Latent diffusion models have achieved remarkable success in high-fidelity text-to-image generation, but their tendency to memorize training data raises critical privacy and intellectual property concerns. Membership inference attacks (MIAs) provide a principled way to audit such memorization by determining whether a given sample was included in training. However, existing approaches assume access to ground-truth captions. This assumption fails in realistic scenarios where only images are available and their textual annotations remain undisclosed, rendering prior methods ineffective when substituted with vision-language model (VLM) captions. In this work, we propose MoFit, a caption-free MIA framework that constructs synthetic conditioning inputs that are explicitly overfitted to the target model's generative manifold. Given a query image, MoFit proceeds in two stages: (i) model-fitted surrogate optimization, where a perturbation applied to the image is optimized to construct a surrogate in regions of the model's unconditional prior learned from member samples, and (ii) surrogate-driven embedding extraction, where a model-fitted embedding is derived from the surrogate and then used as a mismatched condition for the query image. This embedding amplifies conditional loss responses for member samples while leaving hold-outs relatively less affected, thereby enhancing separability in the absence of ground-truth captions. Our comprehensive experiments across multiple datasets and diffusion models demonstrate that MoFit consistently outperforms prior VLM-conditioned baselines and achieves performance competitive with caption-dependent methods.

翻译：潜在扩散模型在高保真文本到图像生成方面取得了显著成功，但其倾向于记忆训练数据的特性引发了严重的隐私和知识产权担忧。成员推断攻击提供了一种原则性方法来审计此类记忆行为，即判断给定样本是否包含在训练集中。然而，现有方法均假设能够获取真实标注文本。这一假设在实际场景中往往不成立——当仅可获得图像且其文本标注未公开时，先前方法在采用视觉语言模型生成的替代标注时会完全失效。本研究提出MoFit，一种无需标注的成员推断框架，通过构建明确过拟合于目标模型生成流形的合成条件输入来实现推断。对于查询图像，MoFit分两个阶段进行：(i) 模型拟合代理优化：对图像施加扰动进行优化，在模型从成员样本中学得的无条件先验区域中构建代理；(ii) 代理驱动嵌入提取：从代理中导出模型拟合嵌入，随后将其作为查询图像的失配条件。该嵌入能放大成员样本的条件损失响应，而对非成员样本影响相对较小，从而在缺乏真实标注的情况下增强可分离性。我们在多个数据集和扩散模型上的综合实验表明，MoFit始终优于先前基于视觉语言模型条件的基线方法，并取得了与依赖标注方法相竞争的性能。