Human-like dexterous hands with multiple fingers offer human-level manipulation capabilities, but training control policies that can directly deploy on real hardware remains difficult due to contact-rich physics and imperfect actuation. We close this gap with a practical sim-to-real reinforcement learning (RL) framework that utilizes dense tactile feedback combined with joint torque sensing to explicitly regulate physical interactions. To enable effective sim-to-real transfer, we introduce (i) a computationally fast tactile simulation that computes distances between dense virtual tactile units and the object via parallel forward kinematics, providing high-rate, high-resolution touch signals needed by RL; (ii) a current-to-torque calibration that eliminates the need for torque sensors on dexterous hands by mapping motor current to joint torque; and (iii) actuator dynamics modeling to bridge the actuation gaps with randomization of non-ideal effects such as backlash, torque-speed saturation. Using an asymmetric actor-critic PPO pipeline trained entirely in simulation, our policies deploy directly to a five-finger hand. The resulting policies demonstrated two essential skills: (1) command-based, controllable grasp force tracking, and (2) reorientation of objects in the hand, both of which were robustly executed without fine-tuning on the robot. By combining tactile and torque in the observation space with effective sensing/actuation modeling, our system provides a practical solution to achieve reliable dexterous manipulation. To our knowledge, this is the first demonstration of controllable grasping on a multi-finger dexterous hand trained entirely in simulation and transferred zero-shot on real hardware.
翻译:具有多指的人类仿生灵巧手提供了人类水平的操作能力,但由于涉及密集接触的物理过程及非理想的驱动特性,训练能够直接部署于真实硬件的控制策略仍然困难。我们通过一个实用的仿真到现实强化学习框架弥合了这一鸿沟,该框架利用密集触觉反馈结合关节力矩传感来显式调控物理交互。为实现有效的仿真到现实迁移,我们引入了:(i)一种计算快速的触觉仿真方法,通过并行前向运动学计算密集虚拟触觉单元与物体间的距离,提供强化学习所需的高频、高分辨率触觉信号;(ii)一种电流-力矩标定方法,通过将电机电流映射到关节力矩,消除了灵巧手上对力矩传感器的需求;(iii)驱动器动力学建模,通过对齿隙、力矩-速度饱和等非理想效应进行随机化,以弥合驱动差距。使用完全在仿真中训练的非对称行动者-评论者PPO框架,我们的策略可直接部署于五指灵巧手。所得策略展示了两种核心技能:(1)基于指令的、可控的抓握力跟踪,以及(2)手中物体的重定向,两者均能在无需机器人微调的情况下鲁棒执行。通过在观测空间中结合触觉与力矩信息,并辅以有效的传感/驱动建模,我们的系统为实现可靠的灵巧操作提供了实用解决方案。据我们所知,这是首次在完全仿真训练、并零次迁移至真实硬件的多指灵巧手上实现可控抓握的演示。