Detecting AI-generated music is crucial for preserving artistic authenticity and preventing the misuse of generative music technologies. However, existing discriminative detectors typically rely on generated samples during training and often suffer from severe performance degradation when confronted with music produced by unseen generators, which limits their real-world applicability. To address this issue, we formulate a zero-shot setting for AI-generated music detection, where the detector is trained exclusively on real music without access to any generated samples. Under this setting, we propose MusicDET, a generator-agnostic detection framework based on frequency-guided normalizing flows that probabilistically models the distribution of real music features. By evaluating the likelihood of an input sample under the learned real-music distribution, MusicDET enables effective detection of out-of-distribution music signals. Experiments on the FakeMusicCaps and SONICS datasets show that MusicDET consistently outperforms conventional discriminative detectors, particularly when detecting music generated by previously unseen models.
翻译:检测AI生成的音乐对于维护艺术真实性和防止生成音乐技术的滥用至关重要。然而,现有的判别式检测器通常依赖训练过程中使用的生成样本,在面对未知生成器产生的音乐时往往出现严重的性能下降,这限制了其实际应用。为解决这一问题,我们针对AI生成音乐检测提出了零样本设定,即检测器仅使用真实音乐进行训练,无需接触任何生成样本。在该设定下,我们提出了MusicDET——一种基于频率引导规范化流的生成器无关检测框架,通过概率建模真实音乐特征的分布。通过评估输入样本在已学习的真实音乐分布下的似然性,MusicDET能够有效检测分布外的音乐信号。在FakeMusicCaps和SONICS数据集上的实验表明,MusicDET持续优于传统判别式检测器,尤其在检测由未见模型生成的音乐时表现突出。