We introduce the Song Describer dataset (SDD), a new crowdsourced corpus of high-quality audio-caption pairs, designed for the evaluation of music-and-language models. The dataset consists of 1.1k human-written natural language descriptions of 706 music recordings, all publicly accessible and released under Creative Common licenses. To showcase the use of our dataset, we benchmark popular models on three key music-and-language tasks (music captioning, text-to-music generation and music-language retrieval). Our experiments highlight the importance of cross-dataset evaluation and offer insights into how researchers can use SDD to gain a broader understanding of model performance.
翻译:我们推出了《歌曲描述者数据集》(SDD),这是一个新的众包式高质量音频-标题对语料库,专为音乐与语言模型的评估而设计。该数据集包含706段音乐录音的1,100条人工撰写的自然语言描述,所有内容均公开提供,并以创作共用许可协议发布。为展示该数据集的应用,我们针对三个关键的音乐与语言任务(音乐字幕生成、文本到音乐生成、音乐-语言检索),对主流模型进行了基准测试。实验强调了跨数据集评估的重要性,并为研究人员如何利用SDD更全面理解模型性能提供了思路。