Generalizable grasping with high-degree-of-freedom (DoF) dexterous hands remains challenging in tiered workspaces, where occlusion, narrow clearances, and height-dependent constraints are substantially stronger than in open tabletop scenes. Most existing methods are evaluated in relatively unoccluded settings and typically do not explicitly model the distinct control requirements of arm navigation and hand articulation under spatial constraints. We present SpaceDex, a hierarchical framework for dexterous manipulation in constrained 3D environments. At the high level, a Vision-Language Model (VLM) planner parses user intent, reasons about occlusion and height relations across multiple camera views, and generates target bounding boxes for zero-shot segmentation and mask tracking. This stage provides structured spatial guidance for downstream control instead of relying on single-view target selection. At the low level, we introduce an arm-hand Feature Separation Network that decouples global trajectory control for the arm from geometry-aware grasp mode selection for the hand, reducing feature interference between reaching and grasping objectives. The controller further integrates multi-view perception, fingertip tactile sensing, and a small set of recovery demonstrations to improve robustness to partial observability and off-nominal contacts. In 100 real-world trials involving over 30 unseen objects across four categories, SpaceDex achieves a 63.0\% success rate, compared with 39.0\% for a strong tabletop baseline. These results indicate that combining hierarchical spatial planning with arm-hand representation decoupling improves dexterous grasping performance in spatially constrained environments.
翻译:可泛化的高自由度灵巧手抓取在分层工作空间中仍具挑战性,此类场景中的遮挡、狭窄间隙及高度相关约束显著强于开放桌面环境。现有方法大多在遮挡较少的场景中评估,且通常未显式建模空间约束下手臂导航与手部关节动作的差异化控制需求。我们提出SpaceDex——一种面向受限三维环境的灵巧操作分层框架。在高层级,视觉-语言模型规划器解析用户意图,推理多视角间的遮挡与高度关系,生成目标边界框用于零样本分割与掩码追踪。该阶段为下游控制提供结构化空间引导,而非依赖单视角目标选择。在低层级,我们引入手臂-手部特征解耦网络,将手臂全局轨迹控制与手部几何感知抓取模式选择相分离,减少到达目标与抓取目标间的特征干扰。控制器进一步融合多视角感知、指尖触觉传感及少量恢复示范,以提升对部分可观测性与非正常接触的鲁棒性。在涉及30余种未见物体的四类100次真实世界实验中,SpaceDex实现了63.0%的成功率,而强桌面基线方法为39.0%。结果表明,结合分层空间规划与手臂-手部表征解耦可提升空间受限环境下的灵巧抓取性能。