Furniture assembly remains an unsolved problem in robotic manipulation due to its long task horizon and nongeneralizable operations plan. This paper presents the Tactile Ensemble Skill Transfer (TEST) framework, a pioneering offline reinforcement learning (RL) approach that incorporates tactile feedback in the control loop. TEST's core design is to learn a skill transition model for high-level planning, along with a set of adaptive intra-skill goal-reaching policies. Such design aims to solve the robotic furniture assembly problem in a more generalizable way, facilitating seamless chaining of skills for this long-horizon task. We first sample demonstration from a set of heuristic policies and trajectories consisting of a set of randomized sub-skill segments, enabling the acquisition of rich robot trajectories that capture skill stages, robot states, visual indicators, and crucially, tactile signals. Leveraging these trajectories, our offline RL method discerns skill termination conditions and coordinates skill transitions. Our evaluations highlight the proficiency of TEST on the in-distribution furniture assemblies, its adaptability to unseen furniture configurations, and its robustness against visual disturbances. Ablation studies further accentuate the pivotal role of two algorithmic components: the skill transition model and tactile ensemble policies. Results indicate that TEST can achieve a success rate of 90\% and is over 4 times more efficient than the heuristic policy in both in-distribution and generalization settings, suggesting a scalable skill transfer approach for contact-rich manipulation.
翻译:家具装配因其任务周期长且操作规划缺乏泛化性,仍是机器人操作领域未解决的难题。本文提出触觉集成技能迁移(TEST)框架,这是一种开创性的离线强化学习(RL)方法,将触觉反馈引入控制回路。TEST的核心设计在于学习用于高层规划的技能迁移模型,以及一组自适应技能内目标趋近策略。该设计旨在以更具泛化性的方式解决机器人家具装配问题,实现该类长周期任务中技能的无缝衔接。我们首先从一组启发式策略和包含随机化子技能片段的轨迹中采样示范数据,从而获取丰富的机器人轨迹,这些轨迹记录了技能阶段、机器人状态、视觉指标以及关键的触觉信号。利用这些轨迹,我们的离线强化学习方法能够识别技能终止条件并协调技能迁移。实验评估凸显了TEST在领域内家具装配中的熟练度、对未见家具配置的适应能力,以及对视觉干扰的鲁棒性。消融研究进一步强调了两个算法组件——技能迁移模型与触觉集成策略——的关键作用。结果表明,TEST在领域内与泛化场景下均能达到90%的成功率,且效率较启发式策略提升4倍以上,这为接触密集型操作的技能迁移提供了一种可扩展的解决方案。