Music popularity prediction has attracted growing research interest, with relevance to artists, platforms, and recommendation systems. However, the explosive rise of AI-generated music platforms has created an entirely new and largely unexplored landscape, where a surge of songs is produced and consumed daily without the traditional markers of artist reputation or label backing. Key, yet unexplored in this pursuit is aesthetic quality. We propose APEX, the first large-scale multi-task learning framework for AI-generated music, trained on over 211k songs (10k hours of audio) from Suno and Udio, that jointly predicts engagement-based popularity signals - streams and likes scores - alongside five perceptual aesthetic quality dimensions from frozen audio embeddings extracted from MERT, a self-supervised music understanding model. Aesthetic quality and popularity capture complementary aspects of music that together prove valuable: in an out-of-distribution evaluation on the Music Arena dataset, comprising pairwise human preference battles across eleven generative music systems unseen during training, including aesthetic features consistently improves preference prediction, demonstrating strong generalisation of the learned representations across generative architectures.
翻译:音乐流行度预测因与艺术家、平台及推荐系统的关联性而日益受到研究关注。然而,AI生成音乐平台的爆炸式发展开创了一个全新且尚未充分探索的领域——每日涌现并消费的海量歌曲缺乏传统艺术家声誉或厂牌背书等标志性特征。在此探索中,美学质量作为关键维度仍属研究空白。我们提出APEX,首个面向AI生成音乐的大规模多任务学习框架,基于从Suno和Udio获取的21.1万首歌曲(1万小时音频)进行训练,该框架联合预测基于参与度的流行度信号(播放量与点赞分数),同时从自监督音乐理解模型MERT提取的冻结音频嵌入中预测五维感知美学质量。美学质量与流行度捕捉了音乐中互相补充的互补维度:在面向Music Arena数据集的分布外评估中(该数据集涵盖训练阶段未见过的十一个生成音乐系统间的成对人类偏好对抗),融入美学特征可持续提升偏好预测性能,证明所学表征在各类生成架构间具备强泛化能力。