We explore the hypothesis that poor compositional generalization in neural networks is caused by difficulties with learning effective routing. To solve this problem, we propose the concept of block-operations, which is based on splitting all activation tensors in the network into uniformly sized blocks and using an inductive bias to encourage modular routing and modification of these blocks. Based on this concept we introduce the Multiplexer, a new architectural component that enhances the Feed Forward Neural Network (FNN). We experimentally confirm that Multiplexers exhibit strong compositional generalization. On both a synthetic and a realistic task our model was able to learn the underlying process behind the task, whereas both FNNs and Transformers were only able to learn heuristic approximations. We propose as future work to use the principles of block-operations to improve other existing architectures.
翻译:本文探讨了神经网络组合泛化能力不足的假设,认为其根源在于难以学习有效的路由机制。为解决此问题,我们提出了模块化操作的概念,其核心思想是将网络中所有激活张量划分为统一尺寸的模块,并利用归纳偏置促进对这些模块的模块化路由与修改。基于此概念,我们引入了多路复用器这一新型架构组件,用于增强前馈神经网络。实验证实,多路复用器展现出优异的组合泛化性能。在合成任务与真实场景任务中,我们的模型能够学习任务背后的本质过程,而前馈神经网络与Transformer仅能学习启发式近似解。我们建议未来工作可将模块化操作原理应用于改进其他现有架构。