Multimodal Information Retrieval has made significant progress in recent years, leveraging the increasingly strong multimodal abilities of deep pre-trained models to represent information across modalities. Music Information Retrieval (MIR), in particular, has considerably increased in quality, with neural representations of music even making its way into everyday life products. However, there is a lack of high-quality benchmarks for evaluating music retrieval performance. To address this issue, we introduce \textbf{IncompeBench}, a carefully annotated benchmark comprising $1,574$ permissively licensed, high-quality music snippets, $500$ diverse queries, and over $125,000$ individual relevance judgements. These annotations were created through the use of a multi-stage pipeline, resulting in high agreement between human annotators and the generated data. The resulting datasets are publicly available at https://huggingface.co/datasets/mixedbread-ai/incompebench-strict and https://huggingface.co/datasets/mixedbread-ai/incompebench-lenient with the prompts available at https://github.com/mixedbread-ai/incompebench-programs.
翻译:近年来,多模态信息检索取得了显著进展,其利用深度预训练模型日益强大的多模态能力来跨模态表征信息。特别是音乐信息检索(MIR)的质量已大幅提升,音乐的神经表征甚至已进入日常生活产品。然而,目前缺乏用于评估音乐检索性能的高质量基准。为解决此问题,我们引入了 **IncompeBench**,这是一个经过精心标注的基准,包含 $1,574$ 个宽松许可的高质量音乐片段、$500$ 个多样化查询以及超过 $125,000$ 条独立相关性判断。这些标注通过多阶段流程创建,确保了人工标注者与生成数据之间的高度一致性。最终数据集已在 https://huggingface.co/datasets/mixedbread-ai/incompebench-strict 和 https://huggingface.co/datasets/mixedbread-ai/incompebench-lenient 公开提供,相关提示可在 https://github.com/mixedbread-ai/incompebench-programs 获取。