In recent years, AI-generated music has made significant progress, with several models performing well in multimodal and complex musical genres and scenes. While objective metrics can be used to evaluate generative music, they often lack interpretability for musical evaluation. Therefore, researchers often resort to subjective user studies to assess the quality of the generated works, which can be resource-intensive and less reproducible than objective metrics. This study aims to comprehensively evaluate the subjective, objective, and combined methodologies for assessing AI-generated music, highlighting the advantages and disadvantages of each approach. Ultimately, this study provides a valuable reference for unifying generative AI in the field of music evaluation.
翻译:近年来,人工智能生成音乐领域取得了显著进展,多个模型在多模态及复杂音乐流派与场景中表现优异。尽管客观指标可用于评估生成音乐,但这些指标在音乐评价中往往缺乏可解释性。因此,研究者通常采用主观用户研究来评估生成作品的质量,这类方法虽能提供深度反馈,但资源消耗大且可复现性低于客观指标。本研究旨在全面评估主观、客观及混合方法论在评估AI生成音乐中的优劣,并最终为该领域的生成式AI评估标准统一提供重要参考。