Understanding spatial location and relationships is a fundamental capability for modern artificial intelligence systems. Insights from human spatial cognition provide valuable guidance in this domain. Recent neuroscientific discoveries have highlighted the role of grid cells as a fundamental neural component for spatial representation, including distance computation, path integration, and scale discernment. In this paper, we introduce a novel positional encoding scheme inspired by Fourier analysis and the latest findings in computational neuroscience regarding grid cells. Assuming that grid cells encode spatial position through a summation of Fourier basis functions, we demonstrate the translational invariance of the grid representation during inner product calculations. Additionally, we derive an optimal grid scale ratio for multi-dimensional Euclidean spaces based on principles of biological efficiency. Utilizing these computational principles, we have developed a **Grid**-cell inspired **Positional Encoding** technique, termed **GridPE**, for encoding locations within high-dimensional spaces. We integrated GridPE into the Pyramid Vision Transformer architecture. Our theoretical analysis shows that GridPE provides a unifying framework for positional encoding in arbitrary high-dimensional spaces. Experimental results demonstrate that GridPE significantly enhances the performance of transformers, underscoring the importance of incorporating neuroscientific insights into the design of artificial intelligence systems.
翻译:理解空间位置与关系是现代人工智能系统的一项基本能力。人类空间认知的洞见为此领域提供了宝贵指导。近期神经科学发现强调了网格细胞作为空间表征基础神经组件的作用,包括距离计算、路径整合与尺度辨识。本文受傅里叶分析及计算神经科学中网格细胞最新发现的启发,提出一种新颖的位置编码方案。基于网格细胞通过傅里叶基函数求和编码空间位置的假设,我们证明了网格表征在内积计算中的平移不变性。此外,依据生物效率原则推导出多维欧几里得空间中的最优网格尺度比。运用这些计算原理,我们开发了名为**GridPE**的**网格**细胞启发性**位置编码**技术,用于高维空间中的位置编码。我们将GridPE集成到Pyramid Vision Transformer架构中。理论分析表明,GridPE为任意高维空间中的位置编码提供了统一框架。实验结果表明GridPE显著提升了Transformer的性能,这凸显了将神经科学洞见融入人工智能系统设计的重要性。