Text-To-Music (TTM) models have recently revolutionized the automatic music generation research field. Specifically, by reaching superior performances to all previous state-of-the-art models and by lowering the technical proficiency needed to use them. Due to these reasons, they have readily started to be adopted for commercial uses and music production practices. This widespread diffusion of TTMs poses several concerns regarding copyright violation and rightful attribution, posing the need of serious consideration of them by the audio forensics community. In this paper, we tackle the problem of detection and attribution of TTM-generated data. We propose a dataset, FakeMusicCaps that contains several versions of the music-caption pairs dataset MusicCaps re-generated via several state-of-the-art TTM techniques. We evaluate the proposed dataset by performing initial experiments regarding the detection and attribution of TTM-generated audio.
翻译:文本到音乐(TTM)模型近期彻底改变了自动音乐生成研究领域。具体而言,这些模型不仅超越了以往所有最先进模型的性能表现,还显著降低了使用所需的技术门槛。基于这些原因,它们已迅速开始被商业应用和音乐制作实践所采纳。TTM模型的广泛传播引发了关于版权侵犯与正确归属的若干关切,亟需音频取证领域予以严肃审视。本文致力于解决TTM生成数据的检测与溯源问题。我们提出了一个数据集FakeMusicCaps,其中包含通过多种前沿TTM技术重新生成的音乐-描述配对数据集MusicCaps的多个版本。我们通过开展针对TTM生成音频的检测与溯源的初步实验,对所提出数据集进行了评估。