In this paper, we try to improve exploration in Blackbox methods, particularly Evolution strategies (ES), when applied to Reinforcement Learning (RL) problems where intermediate waypoints/subgoals are available. Since Evolutionary strategies are highly parallelizable, instead of extracting just a scalar cumulative reward, we use the state-action pairs from the trajectories obtained during rollouts/evaluations, to learn the dynamics of the agent. The learnt dynamics are then used in the optimization procedure to speed-up training. Lastly, we show how our proposed approach is universally applicable by presenting results from experiments conducted on Carla driving and UR5 robotic arm simulators.
翻译:本文旨在改进黑箱方法(特别是演化策略)在强化学习问题中的探索效率,其中涉及可利用的中间航点/子目标。由于演化策略具有高度并行化特性,我们并未仅提取标量累积奖励,而是利用评估/运行过程中获得的轨迹中的状态-动作对来学习智能体的动力学模型。随后,将所学习的动力学模型应用于优化过程中以加速训练。最后,通过在Carla驾驶模拟器和UR5机械臂模拟器上的实验结果,展示了所提出的方法具有普适性。