Contrastive Reinforcement Learning (CRL) has seen recent success in a wide variety of goal-conditioned robotics tasks by learning structured representations of the dynamics. However, despite its success in locomotion and simpler control domains, CRL often struggles in interaction-rich manipulation. We argue that a key source of this difficulty is object-centric interaction, such as contact or grasping, that induces distinct changes in the underlying dynamic modes. In this work, we formulate manipulation dynamics as a piecewise-smooth Markov process and show that interaction-induced mode changes create piecewise nonlinear reachability structures that are difficult for standard CRL energy functions to represent and plan over. Based on this analysis, we introduce Interaction-weighted Resampling (IWR). IWR performs interaction-aware resampling around phases before, during, and after interactions, encouraging the learned representation to preserve the mode boundaries that determine future reachability to capture multi-modal and piecewise nonlinear reachability. Across interaction-centric environments, including 2D dynamic control, robotic manipulation, and robot air hockey, IWR improves both sample efficiency and overall performance over prior CRL methods, with 19.8% average improvement in simulation. Finally, using a sim-to-real pipeline with policies trained by IWR, we demonstrate the first real-world goal-conditioned robot air hockey agent capable of hitting goals, improving success from 25% to 60%. Project Page: IWR-arxiv.github.io.
翻译:对比强化学习(CRL)近年来在多种目标条件机器人任务中通过学习动态的结构化表示取得了成功。然而,尽管CRL在运动控制和简单控制领域表现优异,它在富含交互的操作任务中常常面临困难。我们认为,这一困难的一个关键来源是对象为中心的交互(如接触或抓取),这些交互会导致底层动态模式发生显著变化。在本工作中,我们将操作动态建模为分段光滑马尔可夫过程,并表明交互引起的模式变化产生了分段非线性可达性结构,这使得标准CRL能量函数难以表示和规划。基于这一分析,我们引入了交互加权重采样(IWR)。IWR在交互前后及交互过程中,围绕相关阶段进行交互感知的重采样,鼓励所学表示保留决定未来可达性的模式边界,以捕获多模态和分段非线性可达性。在包括二维动态控制、机器人操作和机器人空气曲棍球在内的交互密集型环境中,IWR相比先前的CRL方法提高了样本效率和整体性能,在仿真中平均提升19.8%。最后,通过使用由IWR训练的策略进行仿真到现实迁移,我们展示了首个能够击中目标的真实世界目标条件机器人空气曲棍球智能体,成功率从25%提升至60%。项目页面:IWR-arxiv.github.io。