Recent work has shown that deep neural networks are capable of approximating both value functions and policies in reinforcement learning domains featuring continuous state and action spaces. However, to the best of our knowledge no previous work has succeeded at using deep neural networks in structured (parameterized) continuous action spaces. To fill this gap, this paper focuses on learning within the domain of simulated RoboCup soccer, which features a small set of discrete action types, each of which is parameterized with continuous variables. The best learned agent can score goals more reliably than the 2012 RoboCup champion agent. As such, this paper represents a successful extension of deep reinforcement learning to the class of parameterized action space MDPs.
翻译:近期研究表明,深度神经网络能够近似强化学习领域中连续状态与动作空间的价值函数和策略。然而,据我们所知,尚无先前研究成功将深度神经网络应用于结构化(参数化)连续动作空间。为填补这一空白,本文聚焦于模拟RoboCup足球领域的自主学习,该领域包含少量离散动作类型,每种类型均以连续变量进行参数化。训练得到的最优智能体在射门得分可靠性上超越了2012年RoboCup冠军智能体。因此,本文成功将深度强化学习扩展至参数化动作空间马尔可夫决策过程这一类问题。