How does the primate brain combine generative and discriminative computations in vision?

Benjamin Peters,James J. DiCarlo,Todd Gureckis,Ralf Haefner,Leyla Isik,Joshua Tenenbaum,Talia Konkle,Thomas Naselaris,Kimberly Stachenfeld,Zenna Tavares,Doris Tsao,Ilker Yildirim,Nikolaus Kriegeskorte

Vision is widely understood as an inference problem. However, two contrasting conceptions of the inference process have each been influential in research on biological vision as well as the engineering of machine vision. The first emphasizes bottom-up signal flow, describing vision as a largely feedforward, discriminative inference process that filters and transforms the visual information to remove irrelevant variation and represent behaviorally relevant information in a format suitable for downstream functions of cognition and behavioral control. In this conception, vision is driven by the sensory data, and perception is direct because the processing proceeds from the data to the latent variables of interest. The notion of "inference" in this conception is that of the engineering literature on neural networks, where feedforward convolutional neural networks processing images are said to perform inference. The alternative conception is that of vision as an inference process in Helmholtz's sense, where the sensory evidence is evaluated in the context of a generative model of the causal processes giving rise to it. In this conception, vision inverts a generative model through an interrogation of the evidence in a process often thought to involve top-down predictions of sensory data to evaluate the likelihood of alternative hypotheses. The authors include scientists rooted in roughly equal numbers in each of the conceptions and motivated to overcome what might be a false dichotomy between them and engage the other perspective in the realm of theory and experiment. The primate brain employs an unknown algorithm that may combine the advantages of both conceptions. We explain and clarify the terminology, review the key empirical evidence, and propose an empirical research program that transcends the dichotomy and sets the stage for revealing the mysterious hybrid algorithm of primate vision.

翻译：视觉通常被理解为一种推理问题。然而，关于推理过程的两种对立概念分别在生物视觉研究与机器视觉工程领域具有重要影响。第一种概念强调自下而上的信号流，将视觉描述为一种主要为前馈式的判别式推理过程：通过过滤和变换视觉信息以消除无关变异，并将行为相关的信息以适合下游认知与行为控制功能的形式呈现。在此概念中，视觉由感官数据驱动，感知具有直接性——因为处理过程从数据出发直至目标隐变量。此处的"推理"概念源自神经网络工程文献，例如处理图像的前馈卷积神经网络即被认为执行推理。另一种概念则是赫尔姆霍茨意义上的视觉推理过程：感官证据在产生该证据的因果过程生成式模型背景下得到评估。在此概念中，视觉通过一种常被认为涉及对感官数据进行自上而下预测（以评估备选假设的可能性）的证据检视过程，对生成式模型进行反演。本文作者包括大致等量植根于两种概念的科学家，其动机在于克服可能存在的错误二分法，并在理论与实验领域引入另一视角。灵长类大脑采用了一种可能融合两种概念优势的未知算法。我们解释并澄清了相关术语，回顾了关键实验证据，并提出了一项超越二分法的实证研究计划，为揭示灵长类视觉中神秘的混合算法奠定基础。