Many music AI models learn a map between music content and human-defined labels. However, many annotations, such as chords, can be naturally expressed within the music modality itself, e.g., as sequences of symbolic notes. This observation enables both understanding tasks (e.g., chord recognition) and conditional generation tasks (e.g., chord-conditioned melody generation) to be unified under a music-for-music sequence modeling paradigm. In this work, we propose parameter-efficient solutions for a variety of symbolic music-for-music tasks. The high-level idea is that (1) we utilize a pretrained Language Model (LM) for both the reference and the target sequence and (2) we link these two LMs via a lightweight adapter. Experiments show that our method achieves superior performance among different tasks such as chord recognition, melody generation, and drum track generation. All demos, code and model weights are publicly available.
翻译:许多音乐AI模型学习的是音乐内容与人工定义标签之间的映射关系。然而,许多标注(如和弦)本身可以在音乐模态内自然地表达,例如作为符号音符序列。这一观察使得理解任务(如和弦识别)和条件生成任务(如和弦条件旋律生成)能够统一在音乐-音乐序列建模范式之下。在本工作中,我们为多种符号音乐-音乐任务提出了参数高效的解决方案。其核心思想是:(1)我们为参考序列和目标序列均利用一个预训练语言模型(LM),(2)我们通过一个轻量级适配器将这两个LM连接起来。实验表明,我们的方法在和弦识别、旋律生成、鼓轨生成等不同任务中均取得了优越的性能。所有演示、代码和模型权重均已公开。