Text-To-Music (TTM) models have recently revolutionized the automatic music generation research field. Specifically, by reaching superior performances to all previous state-of-the-art models and by lowering the technical proficiency needed to use them. Due to these reasons, they have readily started to be adopted for commercial uses and music production practices. This widespread diffusion of TTMs poses several concerns regarding copyright violation and rightful attribution, posing the need of serious consideration of them by the audio forensics community. In this paper, we tackle the problem of detection and attribution of TTM-generated data. We propose a dataset, FakeMusicCaps that contains several versions of the music-caption pairs dataset MusicCaps re-generated via several state-of-the-art TTM techniques. We evaluate the proposed dataset by performing initial experiments regarding the detection and attribution of TTM-generated audio.
翻译:文本到音乐(TTM)模型近期彻底改变了自动音乐生成研究领域。具体而言,这些模型不仅超越了以往所有最先进模型的性能表现,而且显著降低了使用所需的技术门槛。正因如此,TTM模型已迅速开始被商业应用和音乐制作实践所采纳。TTM模型的广泛传播引发了关于版权侵犯与正当归属的诸多担忧,亟需音频取证领域予以严肃关注。本文致力于解决TTM生成数据的检测与溯源问题。我们提出了FakeMusicCaps数据集,该数据集包含通过多种前沿TTM技术重新生成的音乐-描述配对数据集MusicCaps的多个版本。我们通过对TTM生成音频的检测与溯源进行初步实验,对所提数据集进行了评估。