In this paper, we present a novel method for achieving dexterous manipulation of complex objects, while simultaneously securing the object without the use of passive support surfaces. We posit that a key difficulty for training such policies in a Reinforcement Learning framework is the difficulty of exploring the problem state space, as the accessible regions of this space form a complex structure along manifolds of a high-dimensional space. To address this challenge, we use two versions of the non-holonomic Rapidly-Exploring Random Trees algorithm; one version is more general, but requires explicit use of the environment's transition function, while the second version uses manipulation-specific kinematic constraints to attain better sample efficiency. In both cases, we use states found via sampling-based exploration to generate reset distributions that enable training control policies under full dynamic constraints via model-free Reinforcement Learning. We show that these policies are effective at manipulation problems of higher difficulty than previously shown, and also transfer effectively to real robots. Videos of the real-hand demonstrations can be found on the project website: https://sbrl.cs.columbia.edu/
翻译:本文提出了一种实现复杂物体灵巧操作的新方法,同时无需被动支撑面即可稳固抓取物体。我们认为,在强化学习框架下训练此类策略的关键难点在于问题状态空间的探索,因为该空间的可达区域在高维空间中沿流形形成复杂结构。为解决这一挑战,我们采用两种版本的非完整快速探索随机树算法:第一个版本更具通用性,但需显式使用环境的状态转移函数;第二个版本则通过操作特定运动学约束获得更高的采样效率。在两种情况下,我们利用基于采样的探索发现的状态生成重置分布,从而通过无模型强化学习在完整动力学约束下训练控制策略。实验表明,这些策略能有效解决难度高于此前研究的操作问题,并可成功迁移至真实机器人。真实机器人手部演示视频见项目网站:https://sbrl.cs.columbia.edu/