Mastering dexterous manipulation with multi-fingered hands has been a grand challenge in robotics for decades. Despite its potential, the difficulty of collecting high-quality data remains a primary bottleneck for high-precision tasks. While reinforcement learning and simulation-to-real-world transfer offer a promising alternative, the transferred policies often fail for tasks demanding millimeter-scale precision, such as bimanual piano playing. In this work, we introduce HandelBot, a framework that combines a simulation policy and rapid adaptation through a two-stage pipeline. Starting from a simulation-trained policy, we first apply a structured refinement stage to correct spatial alignments by adjusting lateral finger joints based on physical rollouts. Next, we use residual reinforcement learning to autonomously learn fine-grained corrective actions. Through extensive hardware experiments across five recognized songs, we demonstrate that HandelBot can successfully perform precise bimanual piano playing. Our system outperforms direct simulation deployment by a factor of 1.8x and requires only 30 minutes of physical interaction data.
翻译:掌握多指手的灵巧操控是机器人学数十年来的重大挑战。尽管潜力巨大,但高质量数据收集的困难仍是高精度任务的主要瓶颈。虽然强化学习与仿真到现实迁移提供了一种有前景的替代方案,但迁移策略在需要毫米级精度的任务(如双手钢琴演奏)中往往失效。本研究提出HandelBot框架,该框架通过两阶段流程结合仿真策略与快速适应能力。从仿真训练策略出发,我们首先应用结构化优化阶段,基于物理推演调整横向指关节以修正空间对齐。随后采用残差强化学习自主习得细粒度校正动作。通过在五首公认曲目上进行大量硬件实验,我们证明HandelBot能够成功执行精确的双手钢琴演奏。本系统性能超越直接仿真部署1.8倍,且仅需30分钟物理交互数据。