Manipulating deformable objects is a ubiquitous task in household environments, demanding adequate representation and accurate dynamics prediction due to the objects' infinite degrees of freedom. This work proposes DeformNet, which utilizes latent space modeling with a learned 3D representation model to tackle these challenges effectively. The proposed representation model combines a PointNet encoder and a conditional neural radiance field (NeRF), facilitating a thorough acquisition of object deformations and variations in lighting conditions. To model the complex dynamics, we employ a recurrent state-space model (RSSM) that accurately predicts the transformation of the latent representation over time. Extensive simulation experiments with diverse objectives demonstrate the generalization capabilities of DeformNet for various deformable object manipulation tasks, even in the presence of previously unseen goals. Finally, we deploy DeformNet on an actual UR5 robotic arm to demonstrate its capability in real-world scenarios.
翻译:操作可变形物体是家庭环境中的常见任务,由于物体具有无限自由度,需要充分的表示与精确的动力学预测。本文提出DeformNet,利用基于学习的三维表示模型进行潜在空间建模,以有效应对这些挑战。该表示模型结合了PointNet编码器与条件神经辐射场(NeRF),有助于全面获取物体形变及光照条件变化。为建模复杂动力学,我们采用循环状态空间模型(RSSM),以精确预测潜在表征随时间的变化。面向多样化目标的广泛仿真实验表明,DeformNet在多种可变形物体操作任务中具备泛化能力,即使面对未见过的目标场景也能有效应对。最后,我们在真实UR5机械臂上部署DeformNet,以验证其在现实场景中的性能。