Existing work in automatic music generation has mostly focused on end-to-end systems that generate either entire compositions or continuations of pieces, which are difficult for composers to iterate on. The area of computer-assisted composition, where generative models integrate into existing creative workflows, remains comparatively underexplored. In this study, we address the tasks of model style adaptation and multi-track, long-context, and controllable symbolic music infilling to enhance the process of computer-assisted composition. We present MIDI-RWKV, a small foundation model based on the RWKV-7 linear architecture, to enable efficient and coherent musical cocreation on edge devices. We also demonstrate that MIDI-RWKV admits an effective method of finetuning its initial state for style adaptation in the very-low-sample regime. We evaluate MIDI-RWKV and its state tuning on several quantitative and qualitative metrics with respect to existing models, and release model weights and code at https://github.com/christianazinn/MIDI-RWKV.
翻译:现有的自动音乐生成研究主要集中于端到端系统,这些系统要么生成完整的作品,要么生成乐曲的延续部分,这使得作曲者难以进行迭代修改。计算机辅助作曲领域——即生成模型融入现有创作流程的研究——相比之下仍探索不足。在本研究中,我们针对模型风格适应以及多轨、长上下文、可控的符号音乐填充任务展开研究,以增强计算机辅助作曲的流程。我们提出了MIDI-RWKV,这是一个基于RWKV-7线性架构的小型基础模型,旨在边缘设备上实现高效且连贯的音乐协同创作。我们还证明了MIDI-RWKV允许通过微调其初始状态,在极低样本条件下实现有效的风格适应。我们通过多项定量与定性指标,将MIDI-RWKV及其状态调优方法与现有模型进行比较评估,并在https://github.com/christianazinn/MIDI-RWKV发布了模型权重与代码。