Recent work has suggested that certain neural network architectures-particularly recurrent neural networks (RNNs) and implicit neural networks (INNs) are capable of logical extrapolation. That is, one may train such a network on easy instances of a specific task and then apply it successfully to more difficult instances of the same task. In this paper, we revisit this idea and show that (i) The capacity for extrapolation is less robust than previously suggested. Specifically, in the context of a maze-solving task, we show that while INNs (and some RNNs) are capable of generalizing to larger maze instances, they fail to generalize along axes of difficulty other than maze size. (ii) Models that are explicitly trained to converge to a fixed point (e.g. the INN we test) are likely to do so when extrapolating, while models that are not (e.g. the RNN we test) may exhibit more exotic limiting behaviour such as limit cycles, even when they correctly solve the problem. Our results suggest that (i) further study into why such networks extrapolate easily along certain axes of difficulty yet struggle with others is necessary, and (ii) analyzing the dynamics of extrapolation may yield insights into designing more efficient and interpretable logical extrapolators.
翻译:近期研究表明,某些神经网络架构——特别是循环神经网络(RNN)和隐式神经网络(INN)——具备逻辑外推能力。这意味着可以在特定任务的简单实例上训练此类网络,并将其成功应用于同一任务的更困难实例。本文重新审视了这一观点,并证明:(一)外推能力并不如先前研究所暗示的那样稳健。具体而言,在迷宫求解任务中,我们发现虽然INN(以及部分RNN)能够泛化至更大规模的迷宫实例,但无法在迷宫尺寸以外的其他难度维度上实现泛化。(二)经过显式训练以收敛至不动点的模型(例如本文测试的INN)在外推时很可能实现收敛,而未经过此类训练的模型(例如本文测试的RNN)即使能正确解决问题,也可能表现出更复杂的极限行为(如极限环)。我们的结果表明:(一)有必要深入研究此类网络为何能沿某些难度维度轻松外推,却在其他维度上表现困难;(二)对外推动态过程的分析,可能为设计更高效、可解释的逻辑外推器提供重要见解。