We introduce the Song Describer dataset (SDD), a new crowdsourced corpus of high-quality audio-caption pairs, designed for the evaluation of music-and-language models. The dataset consists of 1.1k human-written natural language descriptions of 706 music recordings, all publicly accessible and released under Creative Common licenses. To showcase the use of our dataset, we benchmark popular models on three key music-and-language tasks (music captioning, text-to-music generation and music-language retrieval). Our experiments highlight the importance of cross-dataset evaluation and offer insights into how researchers can use SDD to gain a broader understanding of model performance.
翻译:我们介绍了歌曲描述数据集(SDD),这是一个新型众包高质量音频-文本对语料库,专为音乐与语言模型评估而设计。该数据集包含706个音乐录音的1.1k条人工撰写的自然语言描述,所有数据均公开可访问,并在知识共享许可协议下发布。为展示该数据集的实用性,我们在三个关键音乐与语言任务(音乐描述生成、文本到音乐生成和音乐-语言检索)上对主流模型进行了基准测试。实验强调了跨数据集评估的重要性,并揭示了研究人员如何利用SDD更全面地理解模型性能。