The paper studies the capabilities of Recurrent-Neural-Network sequence to sequence (RNN seq2seq) models in learning four transduction tasks: identity, reversal, total reduplication, and quadratic copying. These transductions are traditionally well studied under finite state transducers and attributed with increasing complexity. We find that RNN seq2seq models are only able to approximate a mapping that fits the training or in-distribution data, instead of learning the underlying functions. Although attention makes learning more efficient and robust, it does not overcome the out-of-distribution generalization limitation. We establish a novel complexity hierarchy for learning the four tasks for attention-less RNN seq2seq models, which may be understood in terms of the complexity hierarchy of formal languages, instead of string transductions. RNN variants also play a role in the results. In particular, we show that Simple RNN seq2seq models cannot count the input length.
翻译:本文研究了循环神经网络序列到序列(RNN seq2seq)模型在学习四项转导任务(恒等、反转、完全重复和二次复制)中的能力。这些转导任务传统上在有限状态换能器框架下被深入研究,且其复杂度呈递增趋势。研究发现,RNN seq2seq模型仅能近似拟合训练数据或同分布数据的映射关系,而无法学习底层函数。尽管注意力机制能提升学习效率与鲁棒性,但并未克服分布外泛化的局限性。本文针对无注意力机制的RNN seq2seq模型,建立了这四项任务的新颖复杂度层级,该层级可依据形式语言理论(而非字符串转导)的复杂度层级加以解释。研究同时表明,RNN变体对结果也有影响。特别地,我们证明了简单RNN seq2seq模型无法对输入序列长度进行计数。