We propose a Model-Based Reinforcement Learning (MBRL) algorithm named VF-MC-PILCO, specifically designed for application to mechanical systems where velocities cannot be directly measured. This circumstance, if not adequately considered, can compromise the success of MBRL approaches. To cope with this problem, we define a velocity-free state formulation which consists of the collection of past positions and inputs. Then, VF-MC-PILCO uses Gaussian Process Regression to model the dynamics of the velocity-free state and optimizes the control policy through a particle-based policy gradient approach. We compare VF-MC-PILCO with our previous MBRL algorithm, MC-PILCO4PMS, which handles the lack of direct velocity measurements by modeling the presence of velocity estimators. Results on both simulated (cart-pole and UR5 robot) and real mechanical systems (Furuta pendulum and a ball-and-plate rig) show that the two algorithms achieve similar results. Conveniently, VF-MC-PILCO does not require the design and implementation of state estimators, which can be a challenging and time-consuming activity to be performed by an expert user.
翻译:我们提出一种名为VF-MC-PILCO的基于模型的强化学习算法,专为无法直接测量速度的机械系统应用而设计。若未充分考虑这一情况,将可能导致基于模型的强化学习方法失败。为解决该问题,我们定义了一种无速度的状态表述,由历史位置和输入序列构成。随后,VF-MC-PILCO通过高斯过程回归对无速度状态的动力学进行建模,并采用基于粒子的策略梯度方法优化控制策略。我们将VF-MC-PILCO与我们先前提出的MC-PILCO4PMS算法进行对比,后者通过建模速度估计器来处理缺乏直接速度测量的问题。在仿真(推车杆系统和UR5机械臂)及真实机械系统(Furuta摆和球板装置)上的实验结果显示,两种算法性能相近。值得注意的是,VF-MC-PILCO无需设计实现状态估计器,这对专家用户而言通常是一项具有挑战性且耗时的任务。