In this paper, we present a novel method for achieving dexterous manipulation of complex objects, while simultaneously securing the object without the use of passive support surfaces. We posit that a key difficulty for training such policies in a Reinforcement Learning framework is the difficulty of exploring the problem state space, as the accessible regions of this space form a complex structure along manifolds of a high-dimensional space. To address this challenge, we use two versions of the non-holonomic Rapidly-Exploring Random Trees algorithm; one version is more general, but requires explicit use of the environment's transition function, while the second version uses manipulation-specific kinematic constraints to attain better sample efficiency. In both cases, we use states found via sampling-based exploration to generate reset distributions that enable training control policies under full dynamic constraints via model-free Reinforcement Learning. We show that these policies are effective at manipulation problems of higher difficulty than previously shown, and also transfer effectively to real robots. Videos of the real-hand demonstrations can be found on the project website: https://sbrl.cs.columbia.edu/
翻译:本文提出了一种新颖方法,可在不使用被动支撑面的情况下实现复杂物体的灵巧操作,同时确保物体抓持稳定。我们认为强化学习框架下训练此类策略的关键难点在于问题状态空间的探索——由于该空间的可达区域在高维空间的流形上形成复杂结构。为应对此挑战,我们采用了两版非完整快速探索随机树算法:首个版本更具通用性,但需显式使用环境转移函数;第二版则通过操作特定的运动学约束以获得更高的采样效率。在两种方案中,我们利用基于采样探索所发现的状态生成重设分布,进而通过无模型强化学习在全动态约束条件下训练控制策略。实验表明,这些策略在比以往更复杂的操作问题上表现优异,并能有效迁移至实体机器人。实体手演示视频可见项目网站:https://sbrl.cs.columbia.edu/