While music generation models have evolved to handle complex multimodal inputs mixing text, lyrics, and reference audio, evaluation mechanisms have lagged behind. In this paper, we bridge this critical gap by establishing a comprehensive ecosystem for music reward modeling under Compositional Multimodal Instruction (CMI), where the generated music may be conditioned on text descriptions, lyrics, and audio prompts. We first introduce CMI-Pref-Pseudo, a large-scale preference dataset comprising 110k pseudo-labeled samples, and CMI-Pref, a high-quality, human-annotated corpus tailored for fine-grained alignment tasks. To unify the evaluation landscape, we propose CMI-RewardBench, a unified benchmark that evaluates music reward models on heterogeneous samples across musicality, text-music alignment, and compositional instruction alignment. Leveraging these resources, we develop CMI reward models (CMI-RMs), a parameter-efficient reward model family capable of processing heterogeneous inputs. We evaluate their correlation with human judgment scores on musicality and alignment on CMI-Pref along with previous datasets. Further experiments demonstrate that CMI-RM not only correlates strongly with human judgments, but also enables effective inference-time scaling via top-k filtering. Code is available at GitHub (https://github.com/Haiwen-Xia/CMI-RewardBench). Model weights: CMI-RM (https://huggingface.co/HaiwenXia/CMI-RM). Datasets: CMI-Pref-Pseudo (https://huggingface.co/datasets/HaiwenXia/cmi-pref-pseudo) and CMI-Pref (https://huggingface.co/datasets/HaiwenXia/cmi-pref)
翻译:尽管音乐生成模型已发展到能处理融合文本、歌词和参考音频的复杂多模态输入,但其评估机制仍相对滞后。本文通过为组合多模态指令(CMI)下的音乐奖励建模建立完整生态系统来填补这一关键空白——生成音乐可能受文本描述、歌词和音频提示共同约束。我们首先引入大规模偏好数据集CMI-Pref-Pseudo(包含11万条伪标签样本),以及为细粒度对齐任务定制的高质量人工标注语料库CMI-Pref。为统一评估生态,我们提出CMI-RewardBench基准——通过音乐性、文本-音乐对齐及组合指令对齐三个维度的异构样本评估音乐奖励模型。依托这些资源,我们开发了CMI奖励模型(CMI-RMs),这是一个能处理异构输入的高效参数化奖励模型家族。在CMI-Pref及现有数据集上,我们评估了该模型在音乐性与对齐任务中与人工评分的相关性。进一步实验表明,CMI-RM不仅与人工判断高度相关,还能通过top-k过滤实现有效的推理时缩放。代码(https://github.com/Haiwen-Xia/CMI-RewardBench)、模型权重(https://huggingface.co/HaiwenXia/CMI-RM)及数据集(CMI-Pref-Pseudo:https://huggingface.co/datasets/HaiwenXia/cmi-pref-pseudo;CMI-Pref:https://huggingface.co/datasets/HaiwenXia/cmi-pref)均已开源。