In this paper, we investigate the length-extension of state-space models (SSMs) in language modeling. Length extension involves training models on short sequences and testing them on longer ones. We show that state-space models trained with zero hidden states initialization have difficulty doing length extension. We explain this difficulty by pointing out the length extension is equivalent to polynomial extrapolation. Based on the theory, we propose a simple yet effective method - changing the hidden states initialization scheme - to improve the length extension. Moreover, our method shows that using long training sequence length is beneficial but not necessary to length extension. Changing the hidden state initialization enables the efficient training of long-memory model with a smaller training context length.
翻译:本文研究了语言建模中状态空间模型(SSMs)的长度扩展问题。长度扩展指在短序列上训练模型,并在更长序列上进行测试。我们发现,采用零隐藏状态初始化的状态空间模型难以实现有效的长度扩展。通过理论分析,我们指出长度扩展问题本质上等价于多项式外推问题。基于该理论,我们提出了一种简单而有效的方法——改变隐藏状态初始化方案——以提升长度扩展能力。此外,我们的方法表明,使用长训练序列对长度扩展有益但非必需。通过改变隐藏状态初始化,能够以更短的训练上下文长度高效训练具有长记忆能力的模型。