State-space models are a low-complexity alternative to transformers for encoding long sequences and capturing long-term dependencies. We propose LOCOST: an encoder-decoder architecture based on state-space models for conditional text generation with long context inputs. With a computational complexity of $O(L \log L)$, this architecture can handle significantly longer sequences than state-of-the-art models that are based on sparse attention patterns. We evaluate our model on a series of long document abstractive summarization tasks. The model reaches a performance level that is 93-96% comparable to the top-performing sparse transformers of the same size while saving up to 50% memory during training and up to 87% during inference. Additionally, LOCOST effectively handles input texts exceeding 600K tokens at inference time, setting new state-of-the-art results on full-book summarization and opening new perspectives for long input processing.
翻译:状态空间模型是一种低复杂度的替代方案,用于编码长序列并捕获长期依赖关系。我们提出LOCOST:一种基于状态空间模型的编码器-解码器架构,专为长上下文输入的条件文本生成设计。该架构的计算复杂度为$O(L \log L)$,能够处理比基于稀疏注意力模式的现有最先进模型更长的序列。我们在一系列长文档抽象式摘要任务上评估该模型。其性能达到同尺寸最优稀疏Transformer的93-96%,同时在训练阶段内存节省高达50%,推理阶段节省高达87%。此外,LOCOST在推理时能有效处理超过60万Token的输入文本,在全书摘要任务上创下新的最优结果,为长输入处理开辟了新视角。