We introduce the Song Describer dataset (SDD), a new crowdsourced corpus of high-quality audio-caption pairs, designed for the evaluation of music-and-language models. The dataset consists of 1.1k human-written natural language descriptions of 706 music recordings, all publicly accessible and released under Creative Common licenses. To showcase the use of our dataset, we benchmark popular models on three key music-and-language tasks (music captioning, text-to-music generation and music-language retrieval). Our experiments highlight the importance of cross-dataset evaluation and offer insights into how researchers can use SDD to gain a broader understanding of model performance.
翻译:我们介绍了歌曲描述者数据集(SDD),这是一个新构建的众包高质量音频-文本配对语料库,专为评估音乐与语言模型而设计。该数据集包含706首音乐录音的1.1万条人工编写的自然语言描述,所有数据均以知识共享许可协议公开发布。为展示该数据集的应用价值,我们在三个关键的音乐-语言任务(音乐描述生成、文本到音乐生成及音乐-语言检索)上对主流模型进行了基准测试。实验结果表明跨数据集评估的重要性,并揭示了研究者如何利用SDD更全面地理解模型性能。