State-space models are a low-complexity alternative to transformers for encoding long sequences and capturing long-term dependencies. We propose LOCOST: an encoder-decoder architecture based on state-space models for conditional text generation with long context inputs. With a computational complexity of $O(L \log L)$, this architecture can handle significantly longer sequences than state-of-the-art models that are based on sparse attention patterns. We evaluate our model on a series of long document abstractive summarization tasks. The model reaches a performance level that is 93-96% comparable to the top-performing sparse transformers of the same size while saving up to 50% memory during training and up to 87% during inference. Additionally, LOCOST effectively handles input texts exceeding 600K tokens at inference time, setting new state-of-the-art results on full-book summarization and opening new perspectives for long input processing.
翻译:状态空间模型是编码长序列并捕获长期依赖关系的低复杂度替代Transformer方案。本文提出LOCOST:一种基于状态空间模型的编码器-解码器架构,用于长上下文输入的条件文本生成。该架构的计算复杂度为$O(L \log L)$,能够处理比基于稀疏注意力模式的最先进模型显著更长的序列。我们在多个长文档抽象摘要任务上评估该模型。其性能表现达到同等规模顶级稀疏Transformer模型的93-96%,同时在训练阶段节省高达50%内存,推理阶段节省高达87%。此外,LOCOST在推理时能有效处理超过60万词元的输入文本,在全书摘要任务上取得新的最优结果,为长输入处理开辟了新视角。