In the context of addressing the Robot Air Hockey Challenge 2023, we investigate the applicability of model-based deep reinforcement learning to acquire a policy capable of autonomously playing air hockey. Our agents learn solely from sparse rewards while incorporating self-play to iteratively refine their behaviour over time. The robotic manipulator is interfaced using continuous high-level actions for position-based control in the Cartesian plane while having partial observability of the environment with stochastic transitions. We demonstrate that agents are prone to overfitting when trained solely against a single playstyle, highlighting the importance of self-play for generalization to novel strategies of unseen opponents. Furthermore, the impact of the imagination horizon is explored in the competitive setting of the highly dynamic game of air hockey, with longer horizons resulting in more stable learning and better overall performance.
翻译:针对2023年机器人气垫球挑战赛,本研究探讨了基于模型的深度强化学习方法在实现自主进行气垫球游戏策略学习方面的适用性。我们的智能体仅通过稀疏奖励进行学习,同时引入自我对抗机制以迭代优化其行为策略。机器人操作器采用笛卡尔坐标系下基于位置的连续高层动作接口进行控制,并在具有随机状态转移特性的部分可观测环境中运行。研究表明,若仅针对单一游戏风格进行训练,智能体容易出现过拟合现象,这凸显了自我对抗机制对于适应未知对手新策略的重要泛化价值。此外,本研究在气垫球这种高度动态的竞技环境中探究了想象视野的影响机制,发现更长的想象视野能够带来更稳定的学习过程和更优越的整体性能表现。