In this paper, we present a novel method for achieving dexterous manipulation of complex objects, while simultaneously securing the object without the use of passive support surfaces. We posit that a key difficulty for training such policies in a Reinforcement Learning framework is the difficulty of exploring the problem state space, as the accessible regions of this space form a complex structure along manifolds of a high-dimensional space. To address this challenge, we use two versions of the non-holonomic Rapidly-Exploring Random Trees algorithm; one version is more general, but requires explicit use of the environment's transition function, while the second version uses manipulation-specific kinematic constraints to attain better sample efficiency. In both cases, we use states found via sampling-based exploration to generate reset distributions that enable training control policies under full dynamic constraints via model-free Reinforcement Learning. We show that these policies are effective at manipulation problems of higher difficulty than previously shown, and also transfer effectively to real robots. Videos of the real-hand demonstrations can be found on the project website: https://sbrl.cs.columbia.edu/
翻译:本文提出了一种新颖的方法,用于实现对复杂物体的灵巧操作,同时在不使用被动支撑面的情况下稳固物体。我们认为,在强化学习框架中训练此类策略的一个关键困难在于问题状态空间的探索难度,因为该空间的可达区域在高维空间的流形上形成了复杂结构。为应对这一挑战,我们使用了两种版本的非完整快速探索随机树算法;其中一种版本更为通用,但需要显式使用环境的转移函数,而第二种版本则利用操作特定的运动学约束以获得更好的样本效率。在两种情况下,我们通过基于采样的探索找到的状态来生成复位分布,从而能够在全动态约束下通过无模型强化学习训练控制策略。我们验证了这些策略在比以往展示过的更高难度的操作问题上的有效性,并且能够成功迁移到真实机器人上。真实手部演示的视频可在项目网站上找到:https://sbrl.cs.columbia.edu/