Classical policy search algorithms for robotics typically require performing extensive explorations, which are time-consuming and expensive to implement with real physical platforms. To facilitate the efficient learning of robot manipulation skills, in this work, we propose a new approach comprised of three modules: (1) learning of general prior knowledge with random explorations in simulation, including state representations, dynamic models, and the constrained action space of the task; (2) extraction of a state alignment-based reward function from a single demonstration video; (3) real-time optimization of the imitation policy under systematic safety constraints with sampling-based model predictive control. This solution results in an efficient one-shot imitation-from-video strategy that simplifies the learning and execution of robot skills in real applications. Specifically, we learn priors in a scene of a task family and then deploy the policy in a novel scene immediately following a single demonstration, preventing time-consuming and risky explorations in the environment. As we do not make a strong assumption of dynamic consistency between the scenes, learning priors can be conducted in simulation to avoid collecting data in real-world circumstances. We evaluate the effectiveness of our approach in the context of contact-rich fabric manipulation, which is a common scenario in industrial and domestic tasks. Detailed numerical simulations and real-world hardware experiments reveal that our method can achieve rapid skill acquisition for challenging manipulation tasks.
翻译:经典机器人策略搜索算法通常需要进行大量探索,这在真实物理平台上实施既耗时又昂贵。为促进机器人操作技能的高效学习,本文提出一种包含三个模块的新方法:(1)在仿真环境中通过随机探索学习通用先验知识,包括状态表征、动态模型以及任务的约束动作空间;(2)从单段示范视频中提取基于状态对齐的奖励函数;(3)采用基于采样的模型预测控制,在系统性安全约束下实时优化模仿策略。该方案形成一种高效的单次视频模仿策略,简化了实际应用中机器人技能的学习与执行过程。具体而言,我们在任务族场景中学习先验,随后在单个示范后立即将策略部署至新场景,从而避免在环境中进行耗时且高风险探索。由于未对场景间的动态一致性做出强假设,先验学习可在仿真环境中完成,无需收集真实世界数据。我们以接触丰富织物操作为应用场景(这是工业及家务任务中的常见场景)评估了方法的有效性。详细的数值仿真与真实硬件实验表明,本方法能够在具有挑战性的操作任务中实现快速技能获取。