Human eye movements in visual recognition reflect a balance between foveal sampling and peripheral context. Task-driven hard-attention models for vision are often evaluated by how well their scanpaths match human gaze. However, common scanpath metrics can be strongly confounded by dataset-specific center bias, especially on object-centric datasets. Using Gaze-CIFAR-10, we show that a trivial center-fixation baseline achieves surprisingly strong scanpath scores, approaching many learned policies. This makes standard metrics optimistic and blurs the distinction between genuine behavioral alignment and mere central tendency. We then analyze a hard-attention classifier under constrained vision by sweeping foveal patch size and peripheral context, revealing a peripheral sweet spot: only a narrow range of sensory constraints yields scanpaths that are simultaneously (i) above the center baseline after debiasing and (ii) temporally human-like in movement statistics. To address center bias, we propose GCS (Gaze Consistency Score), a center-debiased composite metric augmented with movement similarity. GCS uncovers a robust sweet spot at medium patch size with both foveal and peripheral vision, that is not obvious from raw scanpath metrics or accuracy alone, and also highlights a "shortcut regime" when the field-of-view becomes too large. We discuss implications for evaluating active perception on object-centric datasets and for designing gaze benchmarks that better separate behavioral alignment from center bias.
翻译:视觉识别中的人类眼动反映了中央凹采样与外围情境之间的平衡。面向视觉的任务驱动硬注意力模型常通过其扫描路径与人类注视的匹配度进行评估。然而,常用扫描路径指标极易受到数据集特定中心偏置的混杂影响,在以物体为中心的数据集上尤为明显。通过Gaze-CIFAR-10实验,我们发现简单的中心注视基线竟能获得惊人的扫描路径高分,接近许多学习策略的表现。这使得标准指标过于乐观,并模糊了真实行为对齐与单纯中心趋势之间的区别。随后,我们在受限视觉条件下通过扫描中央凹补丁尺寸与外围情境,分析了硬注意力分类器的表现,揭示出一个外周甜点区:仅当感知约束处于狭窄区间时,产生的扫描路径才能同时满足(i)去偏后超越中心基线,且(ii)在运动统计特征上具有时间维度的类人性。为应对中心偏置问题,我们提出GCS(注视一致性分数),这是一种结合运动相似度的去中心化复合指标。GCS在中等补丁尺寸下(兼具中央凹与外围视觉)发现了稳健的甜点区,该现象无法从原始扫描路径指标或单独准确率中显现,同时揭示了视野过大时出现的"捷径机制"。我们讨论了在以物体为中心的数据集上评估主动感知的启示,以及如何设计能更好区分行为对齐与中心偏置的注视基准。