Membership inference attacks (MIAs) test whether a specific audio clip was used to train a model, making them a key tool for auditing generative music models for copyright compliance. However, loss-based signals (e.g., reconstruction error) are weakly aligned with human perception in practice, yielding poor separability at the low false-positive rates (FPRs) required for forensics. We propose the Latent Stability Adversarial Probe (LSA-Probe), a white-box method that measures a geometric property of the reverse diffusion: the minimal time-normalized perturbation budget needed to cross a fixed perceptual degradation threshold at an intermediate diffusion state. We show that training members, residing in more stable regions, exhibit a significantly higher degradation cost.
翻译:成员推理攻击旨在检测特定音频片段是否被用于模型训练,成为审核生成式音乐模型版权合规性的关键工具。然而,基于损失信号的指标(如重构误差)在实践中与人类感知关联较弱,导致在法证分析所需的低误报率条件下可分离性较差。我们提出潜在稳定性对抗探针,这是一种白盒方法,通过测量逆向扩散过程的几何特性——即在中间扩散状态下,为跨越固定感知退化阈值所需的最小时间归一化扰动预算。研究表明,位于更稳定区域的训练成员样本表现出显著更高的退化代价。