When solving a problem, human beings have the adaptive ability in terms of the type of information they use, the procedure they take, and the amount of time they spend approaching and solving the problem. However, most standard neural networks have the same function type and fixed computation budget on different samples regardless of their nature and difficulty. Adaptivity is a powerful paradigm as it not only imbues practitioners with flexibility pertaining to the downstream usage of these models but can also serve as a powerful inductive bias for solving certain challenging classes of problems. In this work, we propose a new strategy, AdaTape, that enables dynamic computation in neural networks via adaptive tape tokens. AdaTape employs an elastic input sequence by equipping an existing architecture with a dynamic read-and-write tape. Specifically, we adaptively generate input sequences using tape tokens obtained from a tape bank that can either be trainable or generated from input data. We analyze the challenges and requirements to obtain dynamic sequence content and length, and propose the Adaptive Tape Reader (ATR) algorithm to achieve both objectives. Via extensive experiments on image recognition tasks, we show that AdaTape can achieve better performance while maintaining the computational cost.
翻译:在解决问题时,人类会根据所利用的信息类型、采用的处理过程以及投入的求解时间展现出自适应能力。然而,大多数标准神经网络对不同的样本都采用相同的函数类型和固定的计算预算,而不考虑样本的性质和难度。自适应是一种强大的范式,因为它不仅赋予了实践者在模型下游使用方面的灵活性,还可以作为解决某些具有挑战性问题的强大归纳偏置。在这项工作中,我们提出了一种新策略——AdaTape,它通过自适应磁带令牌实现神经网络中的动态计算。AdaTape通过为现有架构配备动态读写磁带,实现了弹性输入序列。具体而言,我们利用从磁带库中获得的磁带令牌自适应地生成输入序列,这些磁带令牌可以是可训练的,也可以是从输入数据中生成的。我们分析了实现动态序列内容和长度所需的挑战和要求,并提出了自适应磁带读取器(ATR)算法以实现这两个目标。通过在图像识别任务上的大量实验,我们展示了AdaTape在保持计算成本的同时能够实现更好的性能。