游走式视觉注意：一种基于物体运动敏感性的仿生方法 (Wandering around: A bioinspired approach to visual attention through object motion sensitivity)

Active vision enables dynamic visual perception, offering an alternative to static feedforward architectures in computer vision, which rely on large datasets and high computational resources. Biological selective attention mechanisms allow agents to focus on salient Regions of Interest (ROIs), reducing computational demand while maintaining real-time responsiveness. Event-based cameras, inspired by the mammalian retina, enhance this capability by capturing asynchronous scene changes enabling efficient low-latency processing. To distinguish moving objects while the event-based camera is in motion the agent requires an object motion segmentation mechanism to accurately detect targets and center them in the visual field (fovea). Integrating event-based sensors with neuromorphic algorithms represents a paradigm shift, using Spiking Neural Networks to parallelize computation and adapt to dynamic environments. This work presents a Spiking Convolutional Neural Network bioinspired attention system for selective attention through object motion sensitivity. The system generates events via fixational eye movements using a Dynamic Vision Sensor integrated into the Speck neuromorphic hardware, mounted on a Pan-Tilt unit, to identify the ROI and saccade toward it. The system, characterized using ideal gratings and benchmarked against the Event Camera Motion Segmentation Dataset, reaches a mean IoU of 82.2% and a mean SSIM of 96% in multi-object motion segmentation. The detection of salient objects reaches 88.8% accuracy in office scenarios and 89.8% in low-light conditions on the Event-Assisted Low-Light Video Object Segmentation Dataset. A real-time demonstrator shows the system's 0.12 s response to dynamic scenes. Its learning-free design ensures robustness across perceptual scenes, making it a reliable foundation for real-time robotic applications serving as a basis for more complex architectures.

翻译：主动视觉实现了动态视觉感知，为计算机视觉领域提供了静态前馈架构之外的替代方案，后者通常依赖于大规模数据集和高计算资源。生物选择性注意机制使智能体能够聚焦于显著的兴趣区域，在保持实时响应能力的同时降低计算需求。受哺乳动物视网膜启发的基于事件的相机通过捕获异步场景变化增强了这一能力，实现了高效的低延迟处理。当基于事件的相机处于运动状态时，为区分运动物体，智能体需要物体运动分割机制来准确检测目标并将其置于视野中心（中央凹）。将基于事件的传感器与神经形态算法相结合代表了范式转变，利用脉冲神经网络实现计算并行化并适应动态环境。本研究提出了一种基于脉冲卷积神经网络的仿生注意系统，通过物体运动敏感性实现选择性注意。该系统通过整合在云台单元上的Speck神经形态硬件中的动态视觉传感器，利用固视性眼动产生事件，识别兴趣区域并执行扫视运动。该系统采用理想光栅进行表征，并在事件相机运动分割数据集上进行基准测试，在多物体运动分割中达到82.2%的平均交并比和96%的平均结构相似性指数。在事件辅助低光视频物体分割数据集上，显著物体检测在办公室场景达到88.8%准确率，在低光条件下达到89.8%准确率。实时演示系统显示其对动态场景的响应时间为0.12秒。该系统的无学习设计确保了在不同感知场景中的鲁棒性，为实时机器人应用提供了可靠基础，可作为更复杂架构的构建基石。