Animal vision is thought to optimize various objectives from metabolic efficiency to discrimination performance, yet its ultimate objective is to facilitate the survival of the animal within its ecological niche. However, modeling animal behavior in complex environments has been challenging. To study how environments shape and constrain visual processing, we developed a deep reinforcement learning framework in which an agent moves through a 3-d environment that it perceives through a vision model, where its only goal is to survive. Within this framework we developed a foraging task where the agent must gather food that sustains it, and avoid food that harms it. We first established that the complexity of the vision model required for survival on this task scaled with the variety and visual complexity of the food in the environment. Moreover, we showed that a recurrent network architecture was necessary to fully exploit complex vision models on the most visually demanding tasks. Finally, we showed how different network architectures learned distinct representations of the environment and task, and lead the agent to exhibit distinct behavioural strategies. In summary, this paper lays the foundation for a computational approach to visual ecology, provides extensive benchmarks for future work, and demonstrates how representations and behaviour emerge from an agent's drive for survival.
翻译:动物视觉被认为优化了从代谢效率到识别性能的多种目标,但其终极目标是促进动物在其生态位中的生存。然而,在复杂环境中模拟动物行为一直颇具挑战。为探究环境如何塑造和约束视觉处理过程,我们开发了一个深度强化学习框架:智能体通过视觉模型感知三维环境,其唯一目标是生存。在该框架中,我们设计了一项觅食任务——智能体必须收集能维持生命的食物,同时避开有害食物。我们首先证实,完成该任务所需视觉模型的复杂度与环境内食物的多样性和视觉复杂性成正比。此外,我们证明在处理最具视觉难度的任务时,循环网络架构是充分运用复杂视觉模型的必要条件。最终,我们展示了不同网络架构如何学习环境与任务的不同表征,并引导智能体表现出不同的行为策略。综上所述,本文为视觉生态的计算方法奠定了基础,为后续研究提供了广泛的基准,并揭示了表征与行为如何从智能体的生存驱动中涌现。