Foveated vision, a trait shared by many animals, including humans, has not been fully utilized in machine learning applications, despite its significant contributions to biological visual function. This study investigates whether retinotopic mapping, a critical component of foveated vision, can enhance image categorization and localization performance when integrated into deep convolutional neural networks (CNNs). Retinotopic mapping was integrated into the inputs of standard off-the-shelf convolutional neural networks (CNNs), which were then retrained on the ImageNet task. As expected, the logarithmic-polar mapping improved the network's ability to handle arbitrary image zooms and rotations, particularly for isolated objects. Surprisingly, the retinotopically mapped network achieved comparable performance in classification. Furthermore, the network demonstrated improved classification localization when the foveated center of the transform was shifted. This replicates a crucial ability of the human visual system that is absent in typical convolutional neural networks (CNNs). These findings suggest that retinotopic mapping may be fundamental to significant preattentive visual processes.
翻译:中心凹视觉是包括人类在内的许多动物共有的特征,尽管其对生物视觉功能有重要贡献,但在机器学习应用中尚未得到充分利用。本研究探究了作为中心凹视觉关键组成部分的视网膜拓扑映射,在集成到深度卷积神经网络(CNN)中时,能否提升图像分类与定位性能。我们将视网膜拓扑映射集成到标准现成卷积神经网络的输入中,并在ImageNet任务上重新训练这些网络。正如预期,对数极坐标映射提升了网络处理任意图像缩放和旋转的能力,尤其对孤立目标效果显著。令人惊讶的是,经过视网膜拓扑映射的网络在分类任务上达到了可比的性能。此外,当变换的中心凹区域发生偏移时,网络在分类定位上展现出改进,这复现了人类视觉系统的一项关键能力——该能力在典型卷积神经网络中并不存在。这些发现表明,视网膜拓扑映射可能是重要的前注意视觉过程的基础。