Singing voice generation progresses rapidly, yet evaluating singing quality remains a critical challenge. Human subjective assessment, typically in the form of listening tests, is costly and time consuming, while existing objective metrics capture only limited perceptual aspects. In this work, we introduce SingMOS-Pro, a dataset for automatic singing quality assessment. Building on our preview version SingMOS, which provides only overall ratings, SingMOS-Pro expands annotations of the additional part to include lyrics, melody, and overall quality, offering broader coverage and greater diversity. The dataset contains 7,981 singing clips generated by 41 models across 12 datasets, spanning from early systems to recent advances. Each clip receives at least five ratings from professional annotators, ensuring reliability and consistency. Furthermore, we explore how to effectively utilize MOS data annotated under different standards and benchmark several widely used evaluation methods from related tasks on SingMOS-Pro, establishing strong baselines and practical references for future research. The dataset can be accessed at https://huggingface.co/datasets/TangRain/SingMOS-Pro.
翻译:歌唱声音生成技术发展迅速,然而评估歌唱质量仍然是一个关键挑战。人类主观评估通常以听音测试的形式进行,成本高昂且耗时,而现有的客观指标仅能捕捉有限的感知维度。在本工作中,我们引入了用于自动歌唱质量评估的数据集SingMOS-Pro。基于仅提供整体评分的预览版SingMOS,SingMOS-Pro通过扩展额外部分的标注,纳入了歌词、旋律和整体质量评估,提供了更广泛的覆盖范围和更高的多样性。该数据集包含来自12个数据集的41个模型生成的7,981个歌唱片段,涵盖了从早期系统到最新进展的广泛范围。每个片段均获得至少五位专业标注员的评分,确保了可靠性和一致性。此外,我们探讨了如何有效利用不同标准下标注的平均意见分数数据,并在SingMOS-Pro上对相关任务中几种广泛使用的评估方法进行了基准测试,为未来研究建立了坚实的基线和实用参考。该数据集可通过https://huggingface.co/datasets/TangRain/SingMOS-Pro访问。