ACE-Step 1.5：突破开源音乐生成的边界 (ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation)

We present ACE-Step v1.5, a highly efficient open-source music foundation model that brings commercial-grade generation to consumer hardware. On commonly used evaluation metrics, ACE-Step v1.5 achieves quality beyond most commercial music models while remaining extremely fast -- under 2 seconds per full song on an A100 and under 10 seconds on an RTX 3090. The model runs locally with less than 4GB of VRAM, and supports lightweight personalization: users can train a LoRA from just a few songs to capture their own style. At its core lies a novel hybrid architecture where the Language Model (LM) functions as an omni-capable planner: it transforms simple user queries into comprehensive song blueprints -- scaling from short loops to 10-minute compositions -- while synthesizing metadata, lyrics, and captions via Chain-of-Thought to guide the Diffusion Transformer (DiT). Uniquely, this alignment is achieved through intrinsic reinforcement learning relying solely on the model's internal mechanisms, thereby eliminating the biases inherent in external reward models or human preferences. Beyond standard synthesis, ACE-Step v1.5 unifies precise stylistic control with versatile editing capabilities -- such as cover generation, repainting, and vocal-to-BGM conversion -- while maintaining strict adherence to prompts across 50+ languages. This paves the way for powerful tools that seamlessly integrate into the creative workflows of music artists, producers, and content creators. The code, the model weights and the demo are available at: https://ace-step.github.io/ace-step-v1.5.github.io/

翻译：我们推出ACE-Step v1.5，这是一个高效的开源音乐基础模型，将商业级生成能力带到了消费级硬件上。在常用评估指标上，ACE-Step v1.5的质量超越了大多数商业音乐模型，同时保持极快的速度——在A100上生成整首歌曲不到2秒，在RTX 3090上不到10秒。该模型可在本地运行，所需显存低于4GB，并支持轻量级个性化：用户仅需几首歌曲即可训练一个LoRA来捕捉自己的风格。其核心是一个新颖的混合架构，其中语言模型（LM）充当全能规划器：它将简单的用户查询转化为全面的歌曲蓝图——从短循环到10分钟的作品——同时通过思维链合成元数据、歌词和描述，以指导扩散Transformer（DiT）。独特的是，这种对齐是通过仅依赖模型内部机制的内在强化学习实现的，从而消除了外部奖励模型或人类偏好所固有的偏差。除了标准合成外，ACE-Step v1.5将精确的风格控制与多功能编辑能力（如封面生成、重绘和人声转BGM转换）相统一，同时在50多种语言中严格遵循提示。这为强大工具铺平了道路，这些工具可以无缝集成到音乐艺术家、制作人和内容创作者的创意工作流程中。代码、模型权重和演示可在以下网址获取：https://ace-step.github.io/ace-step-v1.5.github.io/