We introduce VampNet, a masked acoustic token modeling approach to music synthesis, compression, inpainting, and variation. We use a variable masking schedule during training which allows us to sample coherent music from the model by applying a variety of masking approaches (called prompts) during inference. VampNet is non-autoregressive, leveraging a bidirectional transformer architecture that attends to all tokens in a forward pass. With just 36 sampling passes, VampNet can generate coherent high-fidelity musical waveforms. We show that by prompting VampNet in various ways, we can apply it to tasks like music compression, inpainting, outpainting, continuation, and looping with variation (vamping). Appropriately prompted, VampNet is capable of maintaining style, genre, instrumentation, and other high-level aspects of the music. This flexible prompting capability makes VampNet a powerful music co-creation tool. Code and audio samples are available online.
翻译:我们提出VampNet,一种基于掩码声学标记建模的音乐合成、压缩、修复与变奏方法。在训练过程中,我们采用可变掩码调度策略,使得在推理阶段能够通过施加多种掩码方法(称为提示)从模型中采样连贯的音乐。VampNet采用非自回归方式,利用双向Transformer架构在一次前向传播中关注所有标记。仅需36次采样传递,VampNet即可生成连贯的高保真音乐波形。我们证明,通过不同方式的提示,VampNet可应用于音乐压缩、修复、外推、续接及带变奏的循环(vamping)等任务。在适当提示下,VampNet能够保持音乐的风格、流派、配器及其他高层特征。这种灵活的提示能力使VampNet成为强大的音乐协同创作工具。代码与音频样本已在网上公开。