Learning Robot Soccer from Egocentric Vision with Deep Reinforcement Learning

Dhruva Tirumala,Markus Wulfmeier,Ben Moran,Sandy Huang,Jan Humplik,Guy Lever,Tuomas Haarnoja,Leonard Hasenclever,Arunkumar Byravan,Nathan Batchelor,Neil Sreendra,Kushal Patel,Marlon Gwira,Francesco Nori,Martin Riedmiller,Nicolas Heess

We apply multi-agent deep reinforcement learning (RL) to train end-to-end robot soccer policies with fully onboard computation and sensing via egocentric RGB vision. This setting reflects many challenges of real-world robotics, including active perception, agile full-body control, and long-horizon planning in a dynamic, partially-observable, multi-agent domain. We rely on large-scale, simulation-based data generation to obtain complex behaviors from egocentric vision which can be successfully transferred to physical robots using low-cost sensors. To achieve adequate visual realism, our simulation combines rigid-body physics with learned, realistic rendering via multiple Neural Radiance Fields (NeRFs). We combine teacher-based multi-agent RL and cross-experiment data reuse to enable the discovery of sophisticated soccer strategies. We analyze active-perception behaviors including object tracking and ball seeking that emerge when simply optimizing perception-agnostic soccer play. The agents display equivalent levels of performance and agility as policies with access to privileged, ground-truth state. To our knowledge, this paper constitutes a first demonstration of end-to-end training for multi-agent robot soccer, mapping raw pixel observations to joint-level actions, that can be deployed in the real world. Videos of the game-play and analyses can be seen on our website https://sites.google.com/view/vision-soccer .

翻译：我们应用多智能体深度强化学习（RL）训练端到端机器人足球策略，该策略利用机载计算与传感通过自我中心RGB视觉实现完全自主运行。这一设定反映了现实机器人领域的诸多挑战，包括动态、部分可观测、多智能体环境下的主动感知、敏捷全身控制及长期规划。我们依托大规模仿真数据生成，从自我中心视觉中获取复杂行为，并通过低成本传感器成功迁移至实体机器人。为达到充分的视觉真实性，我们的仿真将刚体物理与通过多个神经辐射场（NeRF）学习的高真实感渲染相结合。我们整合基于教师的多智能体RL与跨实验数据复用，以实现复杂足球策略的自主发现。我们分析了在单纯优化与感知无关的足球对抗行为时涌现出的主动感知行为，包括目标跟踪和追球策略。智能体的表现水平与敏捷程度等同于能够获取特权真实状态信息的策略。据我们所知，本文首次展示了从原始像素观测到关节级动作的直接映射、且可部署于真实世界的多智能体机器人足球端到端训练。游戏过程与分析视频可在我们的网站 https://sites.google.com/view/vision-soccer 查看。