The recent success of neural networks in natural language processing has drawn renewed attention to learning sequence-to-sequence (seq2seq) tasks. While there exists a rich literature that studies classification and regression tasks using solvable models of neural networks, seq2seq tasks have not yet been studied from this perspective. Here, we propose a simple model for a seq2seq task that has the advantage of providing explicit control over the degree of memory, or non-Markovianity, in the sequences -- the stochastic switching-Ornstein-Uhlenbeck (SSOU) model. We introduce a measure of non-Markovianity to quantify the amount of memory in the sequences. For a minimal auto-regressive (AR) learning model trained on this task, we identify two learning regimes corresponding to distinct phases in the stationary state of the SSOU process. These phases emerge from the interplay between two different time scales that govern the sequence statistics. Moreover, we observe that while increasing the integration window of the AR model always improves performance, albeit with diminishing returns, increasing the non-Markovianity of the input sequences can improve or degrade its performance. Finally, we perform experiments with recurrent and convolutional neural networks that show that our observations carry over to more complicated neural network architectures.
翻译:神经网络在自然语言处理领域的近期成功重新引起了人们对学习序列到序列(seq2seq)任务的关注。尽管已有丰富的文献通过可解神经网络模型研究分类与回归任务,但序列到序列任务尚未从此视角得到系统研究。本文针对序列到序列任务提出一个简单模型,其优势在于能够显式控制序列中的记忆程度(即非马尔可夫性)——随机切换-奥恩斯坦-乌伦贝克(SSOU)模型。我们引入非马尔可夫性度量来量化序列中的记忆量。针对该任务训练的最小自回归(AR)学习模型,我们识别出与SSOU过程稳态中不同阶段相对应的两个学习阶段。这些阶段源于控制序列统计特性的两个不同时间尺度之间的相互作用。此外,我们观察到:虽然增加AR模型的积分窗口总能提升性能(尽管边际收益递减),但增加输入序列的非马尔可夫性可能改善或降低其性能。最后,我们使用循环神经网络和卷积神经网络进行的实验表明,上述发现可推广至更复杂的神经网络架构。