Closing the Reality Gap: Zero-Shot Sim-to-Real Deployment for Dexterous Force-Based Grasping and Manipulation

Human-like dexterous hands with multiple fingers offer human-level manipulation capabilities, but training control policies that can directly deploy on real hardware remains difficult due to contact-rich physics and imperfect actuation. We close this gap with a practical sim-to-real reinforcement learning (RL) framework that utilizes dense tactile feedback combined with joint torque sensing to explicitly regulate physical interactions. To enable effective sim-to-real transfer, we introduce (i) a computationally fast tactile simulation that computes distances between dense virtual tactile units and the object via parallel forward kinematics, providing high-rate, high-resolution touch signals needed by RL; (ii) a current-to-torque calibration that eliminates the need for torque sensors on dexterous hands by mapping motor current to joint torque; and (iii) actuator dynamics modeling to bridge the actuation gaps with randomization of non-ideal effects such as backlash, torque-speed saturation. Using an asymmetric actor-critic PPO pipeline trained entirely in simulation, our policies deploy directly to a five-finger hand. The resulting policies demonstrated two essential skills: (1) command-based, controllable grasp force tracking, and (2) reorientation of objects in the hand, both of which were robustly executed without fine-tuning on the robot. By combining tactile and torque in the observation space with effective sensing/actuation modeling, our system provides a practical solution to achieve reliable dexterous manipulation. To our knowledge, this is the first demonstration of controllable grasping on a multi-finger dexterous hand trained entirely in simulation and transferred zero-shot on real hardware.

翻译：具有多指的人类仿生灵巧手提供了人类水平的操作能力，但由于涉及密集接触的物理过程及非理想的驱动特性，训练能够直接部署于真实硬件的控制策略仍然困难。我们通过一个实用的仿真到现实强化学习框架弥合了这一鸿沟，该框架利用密集触觉反馈结合关节力矩传感来显式调控物理交互。为实现有效的仿真到现实迁移，我们引入了：（i）一种计算快速的触觉仿真方法，通过并行前向运动学计算密集虚拟触觉单元与物体间的距离，提供强化学习所需的高频、高分辨率触觉信号；（ii）一种电流-力矩标定方法，通过将电机电流映射到关节力矩，消除了灵巧手上对力矩传感器的需求；（iii）驱动器动力学建模，通过对齿隙、力矩-速度饱和等非理想效应进行随机化，以弥合驱动差距。使用完全在仿真中训练的非对称行动者-评论者PPO框架，我们的策略可直接部署于五指灵巧手。所得策略展示了两种核心技能：（1）基于指令的、可控的抓握力跟踪，以及（2）手中物体的重定向，两者均能在无需机器人微调的情况下鲁棒执行。通过在观测空间中结合触觉与力矩信息，并辅以有效的传感/驱动建模，我们的系统为实现可靠的灵巧操作提供了实用解决方案。据我们所知，这是首次在完全仿真训练、并零次迁移至真实硬件的多指灵巧手上实现可控抓握的演示。