Determinantal Point Process Attention Over Grid Cell Code Supports Out of Distribution Generalization

Deep neural networks have made tremendous gains in emulating human-like intelligence, and have been used increasingly as ways of understanding how the brain may solve the complex computational problems on which this relies. However, these still fall short of, and therefore fail to provide insight into how the brain supports strong forms of generalization of which humans are capable. One such case is out-of-distribution (OOD) generalization-successful performance on test examples that lie outside the distribution of the training set. Here, we identify properties of processing in the brain that may contribute to this ability. We describe a two-part algorithm that draws on specific features of neural computation to achieve OOD generalization, and provide a proof of concept by evaluating performance on two challenging cognitive tasks. First we draw on the fact that the mammalian brain represents metric spaces using grid cell code (e.g., in the entorhinal cortex): abstract representations of relational structure, organized in recurring motifs that cover the representational space. Second, we propose an attentional mechanism that operates over the grid cell code using Determinantal Point Process (DPP), that we call DPP attention (DPP-A) -- a transformation that ensures maximum sparseness in the coverage of that space. We show that a loss function that combines standard task-optimized error with DPP-A can exploit the recurring motifs in the grid cell code, and can be integrated with common architectures to achieve strong OOD generalization performance on analogy and arithmetic tasks. This provides both an interpretation of how the grid cell code in the mammalian brain may contribute to generalization performance, and at the same time a potential means for improving such capabilities in artificial neural networks.

翻译：深度神经网络在模拟类人智能方面取得了巨大进展，并越来越多地被用于理解大脑如何解决依赖于此的复杂计算问题。然而，这些方法仍然存在不足，因此未能揭示大脑如何支持人类所具备的强大泛化能力。其中一个案例是分布外泛化——在训练集分布之外的测试样本上表现出色。本文中，我们识别了大脑中可能促成这种能力的处理特性。我们描述了一种两部分算法，该算法利用神经计算的特定特征来实现分布外泛化，并通过评估两项具有挑战性的认知任务的性能来提供概念验证。首先，我们利用哺乳动物大脑使用网格细胞编码（例如，在内嗅皮层中）表示度量空间的事实：这是一种抽象的关系结构表示，以覆盖整个表征空间的重复模式组织。其次，我们提出了一种基于行列式点过程作用于网格细胞编码的注意力机制，称为DPP注意力——该变换确保了该空间覆盖的最大稀疏性。我们证明，结合标准任务优化误差与DPP注意力的损失函数可以利用网格细胞编码中的重复模式，并可与常见架构集成，在类比和算术任务上实现强大的分布外泛化性能。这既为哺乳动物大脑中网格细胞编码如何促进泛化性能提供了一种解释，同时也为改善人工神经网络中此类能力提供了一种潜在方法。