Modelling Human Visual Motion Processing with Trainable Motion Energy Sensing and a Self-attention Network for Adaptive Motion Integration

Visual motion processing is essential for organisms to perceive and interact with dynamic environments. Despite extensive research in cognitive neuroscience, image-computable models that can extract informative motion flow from natural scenes in a manner consistent with human visual processing have yet to be established. Meanwhile, recent advancements in computer vision (CV), propelled by deep learning, have led to significant progress in optical flow estimation, a task closely related to motion perception. Here we propose an image-computable model of human motion perception by bridging the gap between human and CV models. Specifically, we introduce a novel two-stage approach that combines trainable motion energy sensing with a recurrent self-attention network for adaptive motion integration and segregation. This model architecture aims to capture the computations in V1-MT, the core structure for motion perception in the biological visual system. In silico neurophysiology reveals that our model's unit responses are similar to mammalian neural recordings regarding motion pooling and speed tuning. The proposed model can also replicate human responses to a range of stimuli examined in past psychophysical studies. The experimental results on the Sintel benchmark demonstrate that our model predicts human responses better than the ground truth, whereas the CV models show the opposite. Further partial correlation analysis indicates our model outperforms several state-of-the-art CV models in explaining the human responses that deviate from the ground truth. Our study provides a computational architecture consistent with human visual motion processing, although the physiological correspondence may not be exact.

翻译：视觉运动处理对于生物体感知动态环境并进行交互至关重要。尽管认知神经科学领域已有大量研究，但能够以与人类视觉处理一致的方式从自然场景中提取信息性运动流的可计算图像模型尚未建立。与此同时，受深度学习驱动的计算机视觉（CV）领域取得了显著进展，在光流估计（一项与运动感知密切相关的任务）方面尤为突出。本文通过弥合人类与CV模型之间的差距，提出了一个可计算图像的人类运动感知模型。具体而言，我们引入了一种新颖的两阶段方法，将可训练运动能量感知与用于自适应运动整合与分离的循环自注意力网络相结合。该模型架构旨在模拟生物视觉系统中运动感知核心结构V1-MT的计算过程。计算神经生理学显示，我们的模型单元响应在运动整合与速度调谐方面与哺乳动物神经记录相似。所提出的模型还能够复现过去心理物理学研究中多种刺激条件下的人类响应。Sintel基准测试的实验结果表明，我们的模型对人类响应的预测优于真实标注，而CV模型则呈现相反趋势。进一步的偏相关分析表明，在解释偏离真实标注的人类响应方面，我们的模型优于多种最先进的CV模型。本研究提供了一种与人类视觉运动处理一致的计算架构，尽管生理对应关系可能并非完全精确。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日