We propose NeuralThink, a novel deep thinking architecture that can efficiently and consistently extrapolate, i.e., learn algorithms from smaller problems (in terms of observation size) and execute those algorithms in large problems. Contrary to previous deep thinking architectures, NeuralThink can be naturally applied in both same-size problems, where the input and output sizes are the same, and in different-size problems, where the size of the input and output differ. To allow for this versatility, we design NeuralThink with three main components: a recurrent module, that iteratively processes input information at different scales, a processing module, responsible for aggregating the previously processed information, and a curriculum-based training scheme, that improves the extrapolation performance of the method. To evaluate our method we introduce a set of novel different-size tasks and we show that NeuralThink consistently outperforms the prior state-of-the-art deep thinking approaches in extrapolating to larger problems, considering smaller training problems and requiring less parameters than other approaches.
翻译:我们提出NeuralThink——一种新颖的深度思考架构,能够高效且一致地实现外推,即从较小规模问题(就观测尺寸而言)中学习算法,并在大规模问题中执行这些算法。与先前的深度思考架构不同,NeuralThink可自然适用于输入输出尺寸相同的等尺寸问题,以及输入输出尺寸不同的异尺寸问题。为实现这种通用性,我们为NeuralThink设计了三个核心组件:循环模块(在不同尺度上迭代处理输入信息)、处理模块(负责聚合先前处理过的信息),以及基于课程学习的训练方案(提升方法的外推性能)。为评估本方法,我们引入了一系列新颖的异尺寸任务,并证明在考虑较小训练问题且所需参数少于其他方法的情况下,NeuralThink在向更大规模问题外推时始终优于现有最先进的深度思考方法。