Robot decision-making in partially observable, real-time, dynamic, and multi-agent environments remains a difficult and unsolved challenge. Model-free reinforcement learning (RL) is a promising approach to learning decision-making in such domains, however, end-to-end RL in complex environments is often intractable. To address this challenge in the RoboCup Standard Platform League (SPL) domain, we developed a novel architecture integrating RL within a classical robotics stack, while employing a multi-fidelity sim2real approach and decomposing behavior into learned sub-behaviors with heuristic selection. Our architecture led to victory in the 2024 RoboCup SPL Challenge Shield Division. In this work, we fully describe our system's architecture and empirically analyze key design decisions that contributed to its success. Our approach demonstrates how RL-based behaviors can be integrated into complete robot behavior architectures.
翻译:在部分可观测、实时、动态和多智能体环境中,机器人决策仍然是一个困难且尚未解决的挑战。无模型强化学习(RL)是学习此类领域决策的一种有前景的方法,然而,在复杂环境中进行端到端的强化学习通常是难以处理的。为了应对RoboCup标准平台联赛(SPL)领域的这一挑战,我们开发了一种新颖的架构,将强化学习集成到经典机器人架构中,同时采用多保真度仿真到现实(sim2real)方法,并将行为分解为具有启发式选择机制的学习子行为。我们的架构帮助我们在2024年RoboCup SPL挑战盾牌组别中取得了胜利。在这项工作中,我们完整描述了系统的架构,并对促成其成功的关键设计决策进行了实证分析。我们的方法展示了基于强化学习的行为如何能够集成到完整的机器人行为架构中。