When interacting in a three dimensional world, humans must estimate 3D structure from visual inputs projected down to two dimensional retinal images. It has been shown that humans use the persistence of object shape over motion-induced transformations as a cue to resolve depth ambiguity when solving this underconstrained problem. With the aim of understanding how biological vision systems may internally represent 3D transformations, we propose a computational model, based on a generative manifold model, which can be used to infer 3D structure from the motion of 2D points. Our model can also learn representations of the transformations with minimal supervision, providing a proof of concept for how humans may develop internal representations on a developmental or evolutionary time scale. Focused on rotational motion, we show how our model infers depth from moving 2D projected points, learns 3D rotational transformations from 2D training stimuli, and compares to human performance on psychophysical structure-from-motion experiments.
翻译:在三维世界中交互时,人类必须从投射到二维视网膜图像的视觉输入中估计三维结构。研究表明,在解决这一欠约束问题时,人类利用物体形状在运动诱导变换中的不变性作为线索来解决深度模糊性。为理解生物视觉系统如何内部表征三维变换,我们提出一种基于生成流形模型的计算模型,该模型可从二维点的运动中推断三维结构。我们的模型还能以极少量监督学习变换的表征,为人类如何在发育或进化时间尺度上形成内部表征提供概念验证。聚焦于旋转运动,我们展示了模型如何从运动中的二维投影点推断深度、从二维训练刺激学习三维旋转变换,并在心理物理结构-从-运动实验中与人类表现进行对比。