Reinforcement learning (RL) has become widely adopted in robot control. Despite many successes, one major persisting problem can be very low data efficiency. One solution is interactive feedback, which has been shown to speed up RL considerably. As a result, there is an abundance of different strategies, which are, however, primarily tested on discrete grid-world and small scale optimal control scenarios. In the literature, there is no consensus about which feedback frequency is optimal or at which time the feedback is most beneficial. To resolve these discrepancies we isolate and quantify the effect of feedback frequency in robotic tasks with continuous state and action spaces. The experiments encompass inverse kinematics learning for robotic manipulator arms of different complexity. We show that seemingly contradictory reported phenomena occur at different complexity levels. Furthermore, our results suggest that no single ideal feedback frequency exists. Rather that feedback frequency should be changed as the agent's proficiency in the task increases.
翻译:强化学习在机器人控制领域已得到广泛应用。尽管取得了诸多成功,但一个长期存在的关键问题是数据效率极低。交互式反馈作为一种解决方案,已被证明能够显著加速强化学习进程。因此,现有多种不同的交互策略,但这些策略主要基于离散网格世界和小规模最优控制场景进行测试。文献中关于何种反馈频率最优或何时提供反馈最有益尚未达成共识。为解决这些分歧,我们在连续状态与动作空间的机器人任务中隔离并量化了反馈频率的影响。实验涵盖不同复杂度的机械臂逆运动学学习任务。研究表明,看似矛盾的报道现象实际上出现在不同的复杂度水平上。此外,我们的结果表明,并不存在单一的理想反馈频率,而应随着智能体对任务熟练度的提升动态调整反馈频率。