Many real-world manipulation tasks consist of a series of subtasks that are significantly different from one another. Such long-horizon, complex tasks highlight the potential of dexterous hands, which possess adaptability and versatility, capable of seamlessly transitioning between different modes of functionality without the need for re-grasping or external tools. However, the challenges arise due to the high-dimensional action space of dexterous hand and complex compositional dynamics of the long-horizon tasks. We present Sequential Dexterity, a general system based on reinforcement learning (RL) that chains multiple dexterous policies for achieving long-horizon task goals. The core of the system is a transition feasibility function that progressively finetunes the sub-policies for enhancing chaining success rate, while also enables autonomous policy-switching for recovery from failures and bypassing redundant stages. Despite being trained only in simulation with a few task objects, our system demonstrates generalization capability to novel object shapes and is able to zero-shot transfer to a real-world robot equipped with a dexterous hand. More details and video results could be found at https://sequential-dexterity.github.io
翻译:许多现实世界的操作任务由一系列彼此显著不同的子任务组成。这类长时域、复杂的任务凸显了灵巧手的潜力——其具备适应性和多功能性,能够在不需重新抓取或借助外部工具的情况下,无缝切换不同功能模式。然而,灵巧手的高维动作空间以及长时域任务复杂的组合动力学特性带来了挑战。我们提出“顺序灵巧”系统,这是一个基于强化学习的通用框架,通过链式连接多个灵巧策略来实现长时域任务目标。该系统的核心是一个过渡可行性函数,它逐步微调子策略以提高链式连接的成功率,同时支持自主策略切换以实现故障恢复并绕过冗余阶段。尽管仅在仿真环境中使用少量任务物体进行训练,我们的系统仍展现出对新型物体形状的泛化能力,并能零样本迁移至配备灵巧手的真实机器人。更多细节和视频结果可访问 https://sequential-dexterity.github.io