Adaptive recurrent vision performs zero-shot computation scaling to unseen difficulty levels

Humans solving algorithmic (or) reasoning problems typically exhibit solution times that grow as a function of problem difficulty. Adaptive recurrent neural networks have been shown to exhibit this property for various language-processing tasks. However, little work has been performed to assess whether such adaptive computation can also enable vision models to extrapolate solutions beyond their training distribution's difficulty level, with prior work focusing on very simple tasks. In this study, we investigate a critical functional role of such adaptive processing using recurrent neural networks: to dynamically scale computational resources conditional on input requirements that allow for zero-shot generalization to novel difficulty levels not seen during training using two challenging visual reasoning tasks: PathFinder and Mazes. We combine convolutional recurrent neural networks (ConvRNNs) with a learnable halting mechanism based on Graves (2016). We explore various implementations of such adaptive ConvRNNs (AdRNNs) ranging from tying weights across layers to more sophisticated biologically inspired recurrent networks that possess lateral connections and gating. We show that 1) AdRNNs learn to dynamically halt processing early (or late) to solve easier (or harder) problems, 2) these RNNs zero-shot generalize to more difficult problem settings not shown during training by dynamically increasing the number of recurrent iterations at test time. Our study provides modeling evidence supporting the hypothesis that recurrent processing enables the functional advantage of adaptively allocating compute resources conditional on input requirements and hence allowing generalization to harder difficulty levels of a visual reasoning problem without training.

翻译：人类解决算法或推理问题时，其解题时间通常随问题难度提升而增长。自适应递归神经网络已被证实在多种语言处理任务中具备此特性。然而，关于此类自适应计算能否使视觉模型在训练分布难度层级之外实现解决方案外推的研究仍十分有限，且先前工作多聚焦于简单任务。本研究通过递归神经网络探究自适应处理的关键功能：根据输入需求动态分配计算资源，实现训练中未见新难度层级的零样本泛化。我们采用两项具有挑战性的视觉推理任务——PathFinder与Mazes，将卷积递归神经网络（ConvRNNs）与基于Graves（2016）的可学习暂停机制相结合。我们探索了多种自适应ConvRNNs（AdRNNs）的实现方式，涵盖层间权重共享、具有侧向连接与门控机制的仿生递归网络等更复杂的架构。研究结果表明：1）AdRNNs可学习动态调控处理进程（提前或延迟终止），以求解简单（或困难）问题；2）这类RNN通过在测试阶段动态增加递归迭代次数，实现了对训练阶段未呈现的更高难度问题设置的零样本泛化。本研究为"递归处理通过根据输入需求自适应分配计算资源实现视觉推理问题难度层级泛化"这一假说提供了建模证据。