Dexterous hands enable concurrent prehensile and nonprehensile manipulation, such as holding one object while interacting with another, a capability essential for everyday tasks yet underexplored in robotics. Learning such long-horizon, contact-rich multi-stage behaviors is challenging because demonstrations are expensive to collect and end-to-end policies require substantial data to generalize across varied object geometries and placements. We present DexMulti, a sample-efficient approach for real-world dexterous multi-task manipulation that decomposes demonstrations into object-centric skills with well-defined temporal boundaries. Rather than learning monolithic policies, our method retrieves demonstrated skills based on current object geometry, aligns them to the observed object state using an uncertainty-aware estimator that tracks centroid and yaw, and executes them via a retrieve-align-execute paradigm. We evaluate on three multi-stage tasks requiring concurrent manipulation (Grasp + Pull, Grasp + Open, and Grasp + Grasp) across two dexterous hands (Allegro and LEAP) in over 1,000 real-world trials. Our approach achieves an average success rate of 66% on training objects with only 3-4 demonstrations per object, outperforming diffusion policy baselines by 2-3x while requiring far fewer demonstrations. Results demonstrate robust generalization to held-out objects and spatial variations up to +/-25 cm.
翻译:灵巧手能够实现并发抓取与非抓取操作,例如在持握一个物体的同时与另一个物体交互,这种能力对日常任务至关重要,但在机器人领域尚未得到充分探索。学习此类长时程、高接触的多阶段行为具有挑战性,因为示范数据收集成本高昂,且端到端策略需要大量数据才能在不同物体几何形状和放置方式上实现泛化。我们提出DexMulti,一种面向真实世界灵巧多任务操作的样本高效方法,该方法将示范分解为具有明确时间边界的以物体为中心的技能。与学习单一整体策略不同,我们的方法基于当前物体几何形状检索已示范技能,通过一个跟踪质心与偏航角的不确定性感知估计器将其对齐至观测到的物体状态,并采用检索-对齐-执行范式执行这些技能。我们在两项灵巧手(Allegro与LEAP)上对三个需要并发操作的多阶段任务(抓取+拉动、抓取+开启、抓取+抓取)进行了超过1000次真实世界试验评估。我们的方法仅需每个物体3-4次示范,即在训练物体上达到平均66%的成功率,性能超越扩散策略基线2-3倍,且所需示范数据量显著减少。结果表明该方法对未见物体及高达±25厘米的空间变化均具有鲁棒的泛化能力。