Peg-in-hole assembly in unknown environments is a challenging task due to onboard sensor errors, which result in uncertainty and variations in task parameters such as the hole position and orientation. Meta Reinforcement Learning (Meta RL) has been proposed to mitigate this problem as it learns how to quickly adapt to new tasks with different parameters. However, previous approaches either depend on a sample-inefficient procedure or human demonstrations to perform the task in the real world. Our work modifies the data used by the Meta RL agent and uses simple features that can be easily measured in the real world even with an uncalibrated camera. We further adapt the Meta RL agent to use data from a force/torque sensor, instead of the camera, to perform the assembly, using a small amount of training data. Finally, we propose a fine-tuning method that consistently and safely adapts to out-of-distribution tasks with parameters that differ by a factor of 10 from the training tasks. Our results demonstrate that the proposed data modification significantly enhances the training and adaptation efficiency and enables the agent to achieve 100% success in tasks with different hole positions and orientations. Experiments on a real robot confirm that both camera- and force/torque sensor-equipped agents achieve 100% success in tasks with unknown hole positions, matching their simulation performance and validating the approach's robustness and applicability. Compared to the previous work with sample-inefficient adaptation, our proposed methods are 10 times more sample-efficient in the real-world tasks.
翻译:在未知环境中进行孔轴装配是一项具有挑战性的任务,主要源于机载传感器误差导致的任务参数(如孔的位置和方向)存在不确定性与变化。元强化学习(Meta RL)被提出以缓解这一问题,它通过学习如何快速适应具有不同参数的新任务。然而,先前的方法要么依赖于样本效率低下的流程,要么需要人类演示才能在现实世界中执行任务。本研究改进了元强化学习智能体所使用的数据,并采用即使在未校准相机条件下也能在现实世界中轻松测量的简单特征。我们进一步调整元强化学习智能体,使其利用来自力/力矩传感器的数据(而非相机数据)执行装配任务,仅需少量训练数据。最后,我们提出一种微调方法,能够持续且安全地适应分布外任务,其参数与训练任务相差高达10倍。实验结果表明,所提出的数据改进方法显著提升了训练与适应效率,并使智能体在不同孔位置和方向的任务中达到100%的成功率。在真实机器人上的实验证实,配备相机和力/力矩传感器的智能体均在未知孔位置的任务中实现了100%的成功率,与其仿真性能一致,验证了该方法的鲁棒性与适用性。与先前样本效率低下的适应方法相比,我们提出的方法在现实世界任务中的样本效率提高了10倍。