Dexterous manipulation remains one of the most challenging problems in robotics, requiring coherent control of high-DoF hands and arms under complex, contact-rich dynamics. A major barrier is embodiment variability: different dexterous hands exhibit distinct kinematics and dynamics, forcing prior methods to train separate policies or rely on shared action spaces with per-embodiment decoder heads. We present DexFormer, an end-to-end, dynamics-aware cross-embodiment policy built on a modified transformer backbone that conditions on historical observations. By using temporal context to infer morphology and dynamics on the fly, DexFormer adapts to diverse hand configurations and produces embodiment-appropriate control actions. Trained over a variety of procedurally generated dexterous-hand assets, DexFormer acquires a generalizable manipulation prior and exhibits strong zero-shot transfer to Leap Hand, Allegro Hand, and Rapid Hand. Our results show that a single policy can generalize across heterogeneous hand embodiments, establishing a scalable foundation for cross-embodiment dexterous manipulation. Project website: https://davidlxu.github.io/DexFormer-web/.
翻译:灵巧操作仍然是机器人学中最具挑战性的问题之一,它要求在复杂、接触丰富的动力学环境下,对高自由度的手部和臂部进行协调控制。一个主要障碍是具身可变性:不同的灵巧手具有不同的运动学和动力学特性,这迫使先前的方法需要为每个具身训练独立的策略,或依赖于带有针对每个具身的解码器头的共享动作空间。我们提出了DexFormer,这是一种基于改进的Transformer主干网络构建的端到端、动力学感知的跨具身策略,它以历史观测为条件。通过利用时序上下文动态推断形态和动力学,DexFormer能够适应不同的手部配置,并产生适合特定具身的控制动作。在多种程序化生成的灵巧手资产上进行训练后,DexFormer获得了一个可泛化的操作先验,并在Leap Hand、Allegro Hand和Rapid Hand上表现出强大的零样本迁移能力。我们的结果表明,单一策略能够泛化到异构的手部具身,为跨具身灵巧操作建立了一个可扩展的基础。项目网站:https://davidlxu.github.io/DexFormer-web/。