Animals and robots exist in a physical world and must coordinate their bodies to achieve behavioral objectives. With recent developments in deep reinforcement learning, it is now possible for scientists and engineers to obtain sensorimotor strategies (policies) for specific tasks using physically simulated bodies and environments. However, the utility of these methods goes beyond the constraints of a specific task; they offer an exciting framework for understanding the organization of an animal sensorimotor system in connection to its morphology and physical interaction with the environment, as well as for deriving general design rules for sensing and actuation in robotic systems. Algorithms and code implementing both learning agents and environments are increasingly available, but the basic assumptions and choices that go into the formulation of an embodied feedback control problem using deep reinforcement learning may not be immediately apparent. Here, we present a concise exposition of the mathematical and algorithmic aspects of model-free reinforcement learning, specifically through the use of \textit{actor-critic} methods, as a tool for investigating the feedback control underlying animal and robotic behavior.
翻译:动物与机器人存在于物理世界中,必须协调其身体以实现行为目标。随着深度强化学习的最新发展,科学家和工程师现在能够利用物理模拟的身体与环境,为特定任务获取感觉运动策略(策略)。然而,这些方法的效用并不局限于特定任务的约束;它们提供了一个激动人心的框架,用于理解动物感觉运动系统与其形态及物理环境相互作用之间的组织关系,并推导出机器人系统中感知与驱动的通用设计规则。实现学习智能体与环境的算法和代码日益普及,但在使用深度强化学习方法构建具身反馈控制问题时所涉及的基本假设与选择,可能并非显而易见。在此,我们简明阐述了无模型强化学习的数学与算法层面,特别是通过使用\textit{actor-critic}方法,作为研究动物与机器人行为背后反馈控制的工具。