Robot learning is often difficult due to the expense of gathering data. The need for large amounts of data can, and should, be tackled with effective algorithms and leveraging expert information on robot dynamics. Bayesian reinforcement learning (BRL), thanks to its sample efficiency and ability to exploit prior knowledge, is uniquely positioned as such a solution method. Unfortunately, the application of BRL has been limited due to the difficulties of representing expert knowledge as well as solving the subsequent inference problem. This paper advances BRL for robotics by proposing a specialized framework for physical systems. In particular, we capture this knowledge in a factored representation, then demonstrate the posterior factorizes in a similar shape, and ultimately formalize the model in a Bayesian framework. We then introduce a sample-based online solution method, based on Monte-Carlo tree search and particle filtering, specialized to solve the resulting model. This approach can, for example, utilize typical low-level robot simulators and handle uncertainty over unknown dynamics of the environment. We empirically demonstrate its efficiency by performing on-robot learning in two human-robot interaction tasks with uncertainty about human behavior, achieving near-optimal performance after only a handful of real-world episodes. A video of learned policies is at https://youtu.be/H9xp60ngOes.
翻译:机器人学习常因数据采集成本高昂而面临挑战。对大规模数据的需求不仅应当通过高效算法解决,更需借助机器人动力学方面的专家知识。贝叶斯强化学习凭借其样本效率及利用先验知识的能力,成为解决此类问题的独特方法。然而,由于专家知识表征困难及后续推理问题求解复杂,贝叶斯强化学习的实际应用始终受限。本文通过提出面向物理系统的专用框架,推动了贝叶斯强化学习在机器人领域的应用。具体而言,我们首先将专家知识以因子化形式表示,继而证明后验概率可保持相似的因子化结构,最终在贝叶斯框架下完成模型形式化。我们进而提出基于蒙特卡洛树搜索与粒子滤波的样本驱动在线求解方法,专门用于求解该模型。该方法能够兼容典型低层级机器人模拟器,并有效处理环境动力学未知带来的不确定性。通过在两项存在人类行为不确定性的人机交互任务中进行机械臂自主学习实验,我们实证证明了该方法的效率:仅需少量真实世界交互回合即可实现近乎最优性能。学习策略演示视频见https://youtu.be/H9xp60ngOes。