AI music generators have advanced to the point where their outputs are often indistinguishable from human compositions. While detection methods have emerged, they are typically designed and validated in music streaming contexts with clean, full-length tracks. Broadcast audio, however, poses a different challenge: music appears as short excerpts, often masked by dominant speech, conditions under which existing detectors fail. In this work, we introduce AI-OpenBMAT, the first dataset tailored to broadcast-style AI-music detection. It contains 3,294 one-minute audio excerpts (54.9 hours) that follow the duration patterns and loudness relations of real television audio, combining human-made production music with stylistically matched continuations generated with Suno v3.5. We benchmark a CNN baseline and state-of-the-art SpectTTTra models to assess SNR and duration robustness, and evaluate on a full broadcast scenario. Across all settings, models that excel in streaming scenarios suffer substantial degradation, with F1-scores dropping below 60% when music is in the background or has a short duration. These results highlight speech masking and short music length as critical open challenges for AI music detection, and position AI-OpenBMAT as a benchmark for developing detectors capable of meeting industrial broadcast requirements.
翻译:AI音乐生成器已发展到其输出常与人类创作难以区分的程度。尽管检测方法已经出现,但它们通常是在音乐流媒体环境中设计和验证的,使用干净、完整的音轨。然而,广播音频提出了不同的挑战:音乐以短片段形式出现,通常被主导性语音所掩盖,现有检测器在此条件下会失效。在本研究中,我们引入了首个专为广播风格AI音乐检测定制的数据集AI-OpenBMAT。该数据集包含3,294个一分钟音频片段(总计54.9小时),遵循真实电视音频的时长模式和响度关系,结合了人工制作的制作音乐与使用Suno v3.5生成的风格匹配的续作。我们以CNN基线模型和最先进的SpectTTTra模型为基准,评估其信噪比和时长鲁棒性,并在完整广播场景中进行测试。在所有设置中,在流媒体场景中表现优异的模型均出现显著性能下降,当音乐处于背景或持续时间较短时,F1分数降至60%以下。这些结果突显了语音掩蔽和音乐片段短时长是AI音乐检测面临的关键开放挑战,并将AI-OpenBMAT定位为开发能满足工业广播要求的检测器的基准数据集。