Automatic music arrangement streamlines the creation of musical variants for composers and arrangers, reducing reliance on extensive music expertise. However, existing methods suffer from inefficient tokenization, underutilization of pre-trained music language models (LMs), and suboptimal fidelity and coherence in generated arrangements. This paper introduces an efficient multitrack music tokenizer for unconditional and conditional symbolic music generation, along with a unified sequence-to-sequence reconstruction fine-tuning objective for pre-trained music LMs that balances task-specific needs with coherence constraints. Our approach achieves state-of-the-art results on band arrangement, piano reduction, and drum arrangement, surpassing task-specific models in both objective metrics and perceptual quality. Additionally, we demonstrate that generative pretraining significantly contributes to the performance across these arrangement tasks, especially when handling long segments with complex alignment.
翻译:自动音乐编曲技术为作曲家和编曲者简化了音乐变体的创作过程,降低了对广泛音乐专业知识的依赖。然而,现有方法存在标记化效率低下、预训练音乐语言模型(LMs)利用不足,以及生成编曲的保真度和连贯性欠佳等问题。本文提出了一种用于无条件与条件符号音乐生成的高效多轨音乐标记器,并针对预训练音乐LMs设计了一种统一的序列到序列重构微调目标,该目标在任务特定需求与连贯性约束之间取得了平衡。我们的方法在乐队编曲、钢琴简化编曲和鼓组编曲任务上取得了最先进的结果,在客观指标和感知质量上均超越了任务专用模型。此外,我们证明了生成式预训练对这些编曲任务的性能有显著贡献,尤其是在处理具有复杂对齐关系的长片段时。