何处接触、如何接触：面向几何感知长时程灵巧操作的层次化RL-MPC框架 (Where to Touch, How to Contact: Hierarchical RL-MPC Framework for Geometry-Aware Long-Horizon Dexterous Manipulation)

A key challenge in contact-rich dexterous manipulation is the need to jointly reason over geometry, kinematic constraints, and intricate, nonsmooth contact dynamics. End-to-end visuomotor policies bypass this structure, but often require large amounts of data, transfer poorly from simulation to reality, and generalize weakly across tasks/embodiments. We address those limitations by leveraging a simple insight: dexterous manipulation is inherently hierarchical - at a high level, a robot decides where to touch (geometry) and move the object (kinematics); at a low level it determines how to realize that plan through contact dynamics. Building on this insight, we propose a hierarchical RL--MPC framework in which a high-level reinforcement learning (RL) policy predicts a contact intention, a novel object-centric interface that specifies (i) an object-surface contact location and (ii) a post-contact object-level subgoal pose. Conditioned on this contact intention, a low-level contact-implicit model predictive control (MPC) optimizes local contact modes and replans with contact dynamics to generate robot actions that robustly drive the object toward each subgoal. We evaluate the framework on non-prehensile tasks, including geometry-generalized pushing and object 3D reorientation. It achieves near-100% success with substantially reduced data (10x less than end-to-end baselines), highly robust performance, and zero-shot sim-to-real transfer.

翻译：接触密集型灵巧操作的一个核心挑战在于需要同时对几何结构、运动学约束以及复杂非光滑的接触动力学进行联合推理。端到端的视觉运动策略绕过了这种结构化推理，但通常需要大量数据，从仿真到现实的迁移性能较差，且在不同任务/实体间的泛化能力较弱。我们通过利用一个简单的洞见来应对这些局限：灵巧操作本质上是层次化的——在高层级，机器人决定接触位置（几何）并规划物体运动（运动学）；在低层级，则通过接触动力学确定如何实现该规划。基于这一洞见，我们提出了一种层次化RL-MPC框架：高层级强化学习（RL）策略预测接触意图——这是一种新颖的以物体为中心的交互接口，用于指定（i）物体表面接触位置及（ii）接触后物体层级子目标位姿。在此接触意图的条件下，低层级接触隐式模型预测控制（MPC）优化局部接触模式，并基于接触动力学进行重规划，从而生成能够鲁棒驱动物体实现各子目标的机器人动作。我们在非抓取任务上评估该框架，包括几何泛化推动和物体三维重定向。该框架以显著减少的数据量（较端到端基线减少10倍）实现了接近100%的成功率，具备高度鲁棒的性��表现，并实现了零样本仿真到现实的迁移。