Human-like dexterous hands with multiple fingers offer human-level manipulation capabilities, but training control policies that can directly deploy on real hardware remains difficult due to contact-rich physics and imperfect actuation. We close this gap with a practical sim-to-real reinforcement learning (RL) framework that utilizes dense tactile feedback combined with joint torque sensing to explicitly regulate physical interactions. To enable effective sim-to-real transfer, we introduce (i) a computationally fast tactile simulation that computes distances between dense virtual tactile units and the object via parallel forward kinematics, providing high-rate, high-resolution touch signals needed by RL; (ii) a current-to-torque calibration that eliminates the need for torque sensors on dexterous hands by mapping motor current to joint torque; and (iii) actuator dynamics modeling to bridge the actuation gaps with randomization of non-ideal effects such as backlash, torque-speed saturation. Using an asymmetric actor-critic PPO pipeline trained entirely in simulation, our policies deploy directly to a five-finger hand. The resulting policies demonstrated two essential skills: (1) command-based, controllable grasp force tracking, and (2) reorientation of objects in the hand, both of which were robustly executed without fine-tuning on the robot. By combining tactile and torque in the observation space with effective sensing/actuation modeling, our system provides a practical solution to achieve reliable dexterous manipulation. To our knowledge, this is the first demonstration of controllable grasping on a multi-finger dexterous hand trained entirely in simulation and transferred zero-shot on real hardware.
翻译:具有多指的人类仿生灵巧手提供了人类水平的操作能力,但由于涉及密集接触的物理特性及非理想的执行机构,训练能够直接部署于真实硬件的控制策略仍然困难。我们通过一个实用的仿真到现实强化学习框架弥合了这一鸿沟,该框架利用密集触觉反馈结合关节扭矩感知来显式调控物理交互。为实现有效的仿真到现实迁移,我们引入了:(i)一种计算高效的触觉仿真方法,通过并行正运动学计算密集虚拟触觉单元与物体间的距离,为强化学习提供所需的高频、高分辨率触觉信号;(ii)一种电流-扭矩标定方法,通过将电机电流映射到关节扭矩,消除了灵巧手上对扭矩传感器的需求;(iii)执行器动力学建模,通过对齿隙、扭矩-速度饱和等非理想效应进行随机化,以弥合执行机构差距。使用完全在仿真中训练的非对称行动者-评论者PPO框架,我们的策略可直接部署到五指灵巧手上。所得策略展示了两种核心技能:(1)基于指令的、可控的抓握力跟踪,以及(2)手中物体的重定向,两者均能在无需机器人微调的情况下鲁棒执行。通过将触觉与扭矩信息结合于观测空间,并辅以有效的传感/执行机构建模,我们的系统为实现可靠的灵巧操作提供了实用解决方案。据我们所知,这是首个完全在仿真中训练、并零样本迁移至真实硬件的多指灵巧手可控抓取实证研究。