Recent commercial systems such as Suno demonstrate strong capabilities in long-form song generation, while academic research remains largely non-reproducible due to the lack of publicly available training data, hindering fair comparison and progress. To this end, we release a fully open-source system for long-form song generation with fine-grained style conditioning, including a licensed synthetic dataset, training and evaluation pipelines, and Muse, an easy-to-deploy song generation model. The dataset consists of 116k fully licensed synthetic songs with automatically generated lyrics and style descriptions paired with audio synthesized by SunoV5. We train Muse via single-stage supervised finetuning of a Qwen-based language model extended with discrete audio tokens using MuCodec, without task-specific losses, auxiliary objectives, or additional architectural components. Our evaluations find that although Muse is trained with a modest data scale and model size, it achieves competitive performance on phoneme error rate, text--music style similarity, and audio aesthetic quality, while enabling controllable segment-level generation across different musical structures. All data, model weights, and training and evaluation pipelines will be publicly released, paving the way for continued progress in controllable long-form song generation research. The project repository is available at https://github.com/yuhui1038/Muse.
翻译:近期如Suno等商业系统在长歌曲生成方面展现出强大能力,而学术研究因缺乏公开训练数据仍基本处于不可复现状态,这阻碍了公平比较与研究进展。为此,我们发布了一个完全开源的长歌曲生成系统,支持细粒度风格条件控制,包含经授权的合成数据集、训练与评估流程,以及易于部署的歌曲生成模型Muse。该数据集包含11.6万首完全授权的合成歌曲,每首歌曲均配有自动生成的歌词、风格描述以及通过SunoV5合成的音频。我们通过单阶段监督微调训练Muse,该模型基于Qwen语言模型扩展,采用MuCodec离散音频令牌,无需任务特定损失函数、辅助目标或额外架构组件。评估结果表明,尽管Muse在数据规模和模型体量上较为适中,但在音素错误率、文本-音乐风格相似度和音频美学质量方面均达到竞争性表现,同时支持跨不同音乐结构的可控片段级生成。所有数据、模型权重及训练评估流程将完全公开,为可控长歌曲生成研究的持续发展铺平道路。项目代码库地址:https://github.com/yuhui1038/Muse。