Determinantal Point Process Attention Over Grid Cell Code Supports Out of Distribution Generalization

Deep neural networks have made tremendous gains in emulating human-like intelligence, and have been used increasingly as ways of understanding how the brain may solve the complex computational problems on which this relies. However, these still fall short of, and therefore fail to provide insight into how the brain supports strong forms of generalization of which humans are capable. One such case is out-of-distribution (OOD) generalization-successful performance on test examples that lie outside the distribution of the training set. Here, we identify properties of processing in the brain that may contribute to this ability. We describe a two-part algorithm that draws on specific features of neural computation to achieve OOD generalization, and provide a proof of concept by evaluating performance on two challenging cognitive tasks. First we draw on the fact that the mammalian brain represents metric spaces using grid cell code (e.g., in entorhinal cortex): abstract representations of relational structure, organized in recurring motifs that cover the representational space. Second, we propose an attentional mechanism that operates over the grid cell code using Determinantal Point Process (DPP), that we call DPP attention (DPP-A) -- a transformation that ensures maximum sparseness in the coverage of that space. We show that a loss function that combines standard task-optimized error with DPP-A can exploit the recurring motifs in the grid cell code, and can be integrated with common architectures to achieve strong OOD generalization performance on analogy and arithmetic tasks. This provides both an interpretation of how the grid cell code in the mammalian brain may contribute to generalization performance, and at the same time a potential means for improving such capabilities in artificial neural networks.

翻译：深度神经网络在模拟人类智能方面取得了巨大进展，并越来越多地被用作理解大脑如何解决其所依赖的复杂计算问题的手段。然而，这些网络仍无法达到人类所具备的强泛化能力，因此也无法揭示大脑如何支持这种能力。分布外泛化（out-of-distribution, OOD）便是这样一个案例——即在训练集分布之外的测试样本上仍能取得成功表现。本文识别了大脑中可能促成该能力的处理特性。我们提出了一种双部分算法，该算法利用神经计算的特定特征实现OOD泛化，并通过评估其在两项具有挑战性的认知任务上的性能提供了概念验证。首先，我们借鉴了哺乳动物大脑（如内嗅皮层）使用网格细胞编码（grid cell code）表征度量空间的事实：这是一种对关系结构的抽象表征，以覆盖整个表征空间的重复模式组织。其次，我们提出了一种基于行列式点过程（Determinantal Point Process, DPP）的注意力机制，该机制作用于网格细胞编码之上，我们称之为DPP注意力（DPP-A）——这是一种确保该空间覆盖最大稀疏性的变换。研究表明，结合标准任务优化误差与DPP-A的损失函数能够利用网格细胞编码中的重复模式，并可集成到常见架构中，从而在类比推理和算术任务上实现强大的OOD泛化性能。这既解释了哺乳动物大脑中网格细胞编码如何促进泛化能力，同时也为人工神经网络提升此类能力提供了潜在方法。