Attention-based models such as Transformers and recurrent models like state space models (SSMs) have emerged as successful methods for autoregressive sequence modeling. Although both enable parallel training, none enable parallel generation due to their autoregressiveness. We propose the variational SSM (VSSM), a variational autoencoder (VAE) where both the encoder and decoder are SSMs. Since sampling the latent variables and decoding them with the SSM can be parallelized, both training and generation can be conducted in parallel. Moreover, the decoder recurrence allows generation to be resumed without reprocessing the whole sequence. Finally, we propose the autoregressive VSSM that can be conditioned on a partial realization of the sequence, as is common in language generation tasks. Interestingly, the autoregressive VSSM still enables parallel generation. We highlight on toy problems (MNIST, CIFAR) the empirical gains in speed-up and show that it competes with traditional models in terms of generation quality (Transformer, Mamba SSM).
翻译:基于注意力机制的模型(如Transformer)与循环模型(如状态空间模型,SSM)已成为自回归序列建模的成功方法。尽管两者均支持并行训练,但由于其自回归特性,均无法实现并行生成。我们提出了变分状态空间模型(VSSM),这是一种编码器与解码器均为SSM的变分自编码器(VAE)。由于对隐变量进行采样并通过SSM解码的过程可实现并行化,因此训练与生成均可并行执行。此外,解码器的循环特性使得生成过程可在无需重新处理整个序列的情况下恢复进行。最后,我们提出了可基于序列部分实现(这在语言生成任务中常见)进行条件建模的自回归VSSM。值得注意的是,自回归VSSM仍支持并行生成。我们通过在玩具问题(MNIST、CIFAR)上的实验,展示了该方法在加速方面的实际收益,并证明其在生成质量方面可与传统模型(Transformer、Mamba SSM)相竞争。