Computational Music Generation is evolving towards non-conventional styles, demanding methods that enable precise and controllable blending of diverse music elements. In this work, we present a method for fine grained control using inference-time interventions on an autoregressive generative transformer, MusicGen. Through our approach, we achieve genre control by steering the residual stream using weights of a linear probe on it. By framing activation steering as a human-controllable interaction, our work highlights how interpretable model behaviors can empower in co-creative music generation.Audio samples demonstrating our method are available on our demo page.
翻译:计算音乐生成正朝着非传统风格发展,需要能够精确且可控地融合多种音乐元素的方法。本文提出了一种方法,通过对自回归生成式Transformer MusicGen进行推理时干预实现细粒度控制。通过我们的方法,我们利用残差流上的线性探针权重进行引导,从而实现对音乐体裁的控制。将激活引导构建为人类可控的交互方式,本研究凸显了可解释的模型行为如何赋能协作式音乐生成。展示我们方法的音频样本可在我方演示页面获取。