PLUME: Probabilistic Latent Unified World Modeling and Parameter Estimation for Multi-Finger Manipulation

Dexterous manipulation with multi-finger hands can be sensitive to physical parameters such as object shape, pose, and friction coefficients. While simulation enables large-scale data collection with known parameter values, simulation-trained policies must still handle uncertainty at deployment, where the true parameters and therefore the true dynamics are unknown. Standard domain randomization strategies may be insufficient for precise tasks like screwdriver turning, as manipulation strategies may need to change depending on specific parameter values. To address this, we propose Probabilistic Latent Unified world Modeling and parameter Estimation (PLUME), a world model that jointly learns to evolve a belief over parameter values as well as the system dynamics conditioned on those parameters. We learn a latent space to jointly represent multiple qualitatively different physical parameters along with rewards, themselves functions of partially-observable variables, to inform planning. Our novel learning framework leads to efficient alignment of the world model to true dynamics through online parameter inference as opposed to re-training or fine-tuning. We evaluate our method on simulated screwdriver turning, valve turning, bucket lifting, and disk flicking tasks, as well as a hardware screwdriver turning task, where we achieve successful zero-shot transfer of our simulation-trained policy and outperform state-of-the-art offline reinforcement learning and world-model-augmented behavior cloning baselines. Please see our website at https://plume-world-model.github.io for videos.

翻译：多指手进行灵巧操作时，对物体形状、姿态、摩擦系数等物理参数十分敏感。尽管仿真环境允许在已知参数值下进行大规模数据采集，但基于仿真训练的策略在部署时必须处理不确定性——此时真实参数及由此决定的真实动力学特性均未知。标准领域随机化策略可能不足以应对螺丝刀旋拧等精密任务，因为操作策略需要根据特定参数值动态调整。为此，我们提出概率潜在统一世界建模与参数估计（PLUME），这是一种联合学习参数信念演化与基于参数条件的系统动力学的世界模型。我们构建潜在空间以联合表征多个定性不同的物理参数及奖励函数（奖励本身是部分可观测变量的函数），从而为规划提供信息。这一新型学习框架通过在线参数推断（而非重新训练或微调）实现世界模型与真实动力学的高效对齐。我们在模拟螺丝刀旋拧、阀门转动、铲斗提升、圆盘拨动任务及硬件螺丝刀旋拧任务上评估了该方法，成功实现了仿真训练策略的零样本迁移，并超越了当前最优的离线强化学习与世界模型增强行为克隆基线方法。视频请访问我们的网站 https://plume-world-model.github.io。