Machine-generated music (MGM) has emerged as a powerful tool with applications in music therapy, personalised editing, and creative inspiration for the music community. However, its unregulated use threatens the entertainment, education, and arts sectors by diminishing the value of high-quality human compositions. Detecting machine-generated music (MGMD) is, therefore, critical to safeguarding these domains, yet the field lacks comprehensive datasets to support meaningful progress. To address this gap, we introduce \textbf{M6}, a large-scale benchmark dataset tailored for MGMD research. M6 is distinguished by its diversity, encompassing multiple generators, domains, languages, cultural contexts, genres, and instruments. We outline our methodology for data selection and collection, accompanied by detailed data analysis, providing all WAV form of music. Additionally, we provide baseline performance scores using foundational binary classification models, illustrating the complexity of MGMD and the significant room for improvement. By offering a robust and multifaceted resource, we aim to empower future research to develop more effective detection methods for MGM. We believe M6 will serve as a critical step toward addressing this societal challenge. The dataset and code will be freely available to support open collaboration and innovation in this field.
翻译:机器生成音乐(MGM)已成为一种强大的工具,在音乐治疗、个性化编辑及音乐创作灵感激发等领域展现出广泛应用前景。然而,其无监管使用可能通过削弱高质量人类创作作品的价值,对娱乐、教育和艺术领域构成威胁。因此,检测机器生成音乐(MGMD)对于保护这些领域至关重要,但该领域目前缺乏支持实质性进展的综合性数据集。为填补这一空白,我们推出了专为MGMD研究设计的大规模基准数据集——**M6**。M6以其多样性为显著特征,涵盖多生成器、多领域、多语言与文化背景、多流派及多乐器。我们阐述了数据选择与收集的方法论,并辅以详细的数据分析,同时提供了所有音乐的WAV格式文件。此外,我们利用基础二分类模型提供了基线性能分数,揭示了MGMD任务的复杂性及巨大的改进空间。通过提供这一稳健且多层面的资源,我们旨在赋能未来研究,以开发更有效的MGM检测方法。我们相信M6将成为应对这一社会挑战的关键一步。数据集与代码将公开提供,以支持该领域的开放协作与创新。