一手掌控：面向统一灵巧操作的规范表示 (One Hand to Rule Them All: Canonical Representations for Unified Dexterous Manipulation)

Dexterous manipulation policies today largely assume fixed hand designs, severely restricting their generalization to new embodiments with varied kinematic and structural layouts. To overcome this limitation, we introduce a parameterized canonical representation that unifies a broad spectrum of dexterous hand architectures. It comprises a unified parameter space and a canonical URDF format, offering three key advantages. 1) The parameter space captures essential morphological and kinematic variations for effective conditioning in learning algorithms. 2) A structured latent manifold can be learned over our space, where interpolations between embodiments yield smooth and physically meaningful morphology transitions. 3) The canonical URDF standardizes the action space while preserving dynamic and functional properties of the original URDFs, enabling efficient and reliable cross-embodiment policy learning. We validate these advantages through extensive analysis and experiments, including grasp policy replay, VAE latent encoding, and cross-embodiment zero-shot transfer. Specifically, we train a VAE on the unified representation to obtain a compact, semantically rich latent embedding, and develop a grasping policy conditioned on the canonical representation that generalizes across dexterous hands. We demonstrate, through simulation and real-world tasks on unseen morphologies (e.g., 81.9% zero-shot success rate on 3-finger LEAP Hand), that our framework unifies both the representational and action spaces of structurally diverse hands, providing a scalable foundation for cross-hand learning toward universal dexterous manipulation.

翻译：当前灵巧操作策略大多假设固定的手部设计，严重限制了其向具有不同运动学与结构布局的新形态的泛化能力。为克服这一局限性，我们提出了一种参数化的规范表示，统一了广泛的灵巧手架构。该表示包含统一的参数空间与规范的URDF格式，具备三大优势：1）参数空间捕捉了形态学与运动学上的关键变化，为学习算法提供了有效的条件信息；2）可在该空间上学习结构化的潜在流形，其中不同形态间的插值能产生平滑且物理意义明确的形态过渡；3）规范URDF在保持原始URDF动态与功能特性的同时，标准化了动作空间，实现了高效可靠的跨形态策略学习。我们通过广泛的分析与实验验证了这些优势，包括抓取策略重放、VAE潜在编码以及跨形态零样本迁移。具体而言，我们在统一表示上训练VAE以获得紧凑且语义丰富的潜在嵌入，并开发了一种基于规范表示的条件抓取策略，该策略可泛化至不同的灵巧手。通过仿真及在未见形态（如三指LEAP Hand上达到81.9%的零样本成功率）上的真实任务，我们证明该框架统一了结构多样化手部的表示空间与动作空间，为跨手学习以实现通用灵巧操作提供了可扩展的基础。