General robot skill adaptation requires expressive representations robust to varying task configurations. While recent learning-based skill adaptation methods refined via Reinforcement Learning (RL), have shown success, existing skill models often lack sufficient representational capacity for anything beyond minor environmental changes. In contrast, Gaussian Process (GP)-based skill modelling provides an expressive representation with useful analytical properties; however, adaptation of GP-based skills remains underexplored. This paper proposes a novel, robust skill adaptation framework that utilises GPs with sparse via-points for compact and expressive modelling. The model considers the trajectory's poses and leverages its first and second analytical derivatives to preserve the skill's kinematic profile. We present three adaptation methods to cater for the variability between initial and observed configurations. Firstly, an optimisation agent that adjusts the path's via-points while preserving the demonstration velocity. Second, a behaviour cloning agent trained to replicate output trajectories from the optimisation agent. Lastly, an RL agent that has learnt to modify via-points whilst maintaining the kinematic profile and enabling online capabilities. Evaluated across three tasks (drawer opening, cube-pushing and bar manipulation) in both simulation and hardware, our proposed methods outperform every benchmark in success rates. Furthermore, the results demonstrate that the GP-based representation enables all three methods to attain high cosine similarity and low velocity magnitude errors, indicating strong preservation of the kinematic profile. Overall, our formulation provides a compact representation capable of adapting to large deviations from a single demonstrated skill.
翻译:通用机器人技能适应需要能够应对不同任务配置的鲁棒性表达表示。尽管近期基于学习的技能适应方法通过强化学习(RL)进行优化已取得成功,但现有技能模型通常缺乏足够的表达能力,难以应对超出微小环境变化的场景。相比之下,基于高斯过程(GP)的技能建模提供了具有实用解析特性的表达性表示;然而,基于GP技能的适应机制仍未得到充分探索。本文提出一种新颖的鲁棒技能适应框架,该框架利用带有稀疏路径点的高斯过程实现紧凑且表达性强的建模。该模型考虑轨迹位姿,并利用其一阶和二阶解析导数以保持技能的运动学特征。我们提出三种适应方法以应对初始配置与观测配置之间的变异性:首先,一种优化代理器,可在保持演示速度的同时调整路径的路径点;其次,一种行为克隆代理器,经过训练可复现优化代理器的输出轨迹;最后,一种强化学习代理器,已学会在保持运动学特征并实现在线能力的同时修改路径点。通过在仿真和硬件中对三项任务(抽屉开启、立方体推动和杆件操作)进行评估,我们提出的方法在成功率上均优于所有基准方法。此外,结果表明基于GP的表示使所有三种方法均能实现高余弦相似度和低速度幅值误差,表明其能有效保持运动学特征。总体而言,我们的框架提供了一种紧凑的表示方法,能够适应与单次演示技能存在较大偏差的场景。