The rapidly evolving multimodal Large Language Models (LLMs) urgently require new benchmarks to uniformly evaluate their performance on understanding and textually describing music. However, due to semantic gaps between Music Information Retrieval (MIR) algorithms and human understanding, discrepancies between professionals and the public, and low precision of annotations, existing music description datasets cannot serve as benchmarks. To this end, we present MuChin, the first open-source music description benchmark in Chinese colloquial language, designed to evaluate the performance of multimodal LLMs in understanding and describing music. We established the Caichong Music Annotation Platform (CaiMAP) that employs an innovative multi-person, multi-stage assurance method, and recruited both amateurs and professionals to ensure the precision of annotations and alignment with popular semantics. Utilizing this method, we built a dataset with multi-dimensional, high-precision music annotations, the Caichong Music Dataset (CaiMD), and carefully selected 1,000 high-quality entries to serve as the test set for MuChin. Based on MuChin, we analyzed the discrepancies between professionals and amateurs in terms of music description, and empirically demonstrated the effectiveness of annotated data for fine-tuning LLMs. Ultimately, we employed MuChin to evaluate existing music understanding models on their ability to provide colloquial descriptions of music. All data related to the benchmark, along with the scoring code and detailed appendices, have been open-sourced (https://github.com/CarlWangChina/MuChin/).
翻译:快速发展的多模态大语言模型迫切需求新的基准测试,以统一评估其在音乐理解与文本描述方面的性能。然而,由于音乐信息检索算法与人类理解之间的语义鸿沟、专业人士与公众的认知差异以及标注精度不足,现有音乐描述数据集无法作为基准测试使用。为此,我们提出MuChin——首个面向中文口语化语言的音乐描述开源基准,旨在评估多模态大语言模型在音乐理解与描述方面的性能。我们构建了彩虫音乐标注平台(CaiMAP),采用创新的多人多阶段保证方法,并招募业余爱好者与专业人士共同参与,确保标注精度与大众语义对齐。基于该方法,我们建立了具有多维高精度音乐标注的数据集——彩虫音乐数据集(CaiMD),并精心挑选1000个高质量条目作为MuChin的测试集。通过MuChin,我们分析了专业人士与业余爱好者在音乐描述方面的差异,并通过实验证明标注数据对大语言模型微调的有效性。最终,我们利用MuChin评估现有音乐理解模型提供口语化音乐描述的能力。该基准相关的所有数据、评分代码及详细附录均已开源(https://github.com/CarlWangChina/MuChin/)。