We introduce VampNet, a masked acoustic token modeling approach to music synthesis, compression, inpainting, and variation. We use a variable masking schedule during training which allows us to sample coherent music from the model by applying a variety of masking approaches (called prompts) during inference. VampNet is non-autoregressive, leveraging a bidirectional transformer architecture that attends to all tokens in a forward pass. With just 36 sampling passes, VampNet can generate coherent high-fidelity musical waveforms. We show that by prompting VampNet in various ways, we can apply it to tasks like music compression, inpainting, outpainting, continuation, and looping with variation (vamping). Appropriately prompted, VampNet is capable of maintaining style, genre, instrumentation, and other high-level aspects of the music. This flexible prompting capability makes VampNet a powerful music co-creation tool. Code and audio samples are available online.
翻译:我们提出VampNet,一种基于掩码声学令牌建模的音乐合成、压缩、修补与变奏方法。在训练过程中,我们采用可变掩码调度策略,从而在推理时通过应用多种掩码方法(称为提示)使模型能够生成连贯的音乐。VampNet是非自回归模型,利用双向Transformer架构,在前向传播中关注所有令牌。仅需36次采样迭代,VampNet即可生成连贯的高保真音乐波形。我们证明,通过不同方式的提示,VampNet可应用于音乐压缩、修补、扩延、续奏以及带变奏的循环(即vamping)等任务。在适当提示下,VampNet能够保持音乐的风格、流派、配器及其他高层特征。这种灵活的提示能力使VampNet成为强大的音乐协同创作工具。代码与音频样本已在线发布。