We propose a new dataset and a novel approach to learning hand-object interaction priors for hand and articulated object pose estimation. We first collect a dataset using visual teleoperation, where the human operator can directly play within a physical simulator to manipulate the articulated objects. We record the data and obtain free and accurate annotations on object poses and contact information from the simulator. Our system only requires an iPhone to record human hand motion, which can be easily scaled up and largely lower the costs of data and annotation collection. With this data, we learn 3D interaction priors including a discriminator (in a GAN) capturing the distribution of how object parts are arranged, and a diffusion model which generates the contact regions on articulated objects, guiding the hand pose estimation. Such structural and contact priors can easily transfer to real-world data with barely any domain gap. By using our data and learned priors, our method significantly improves the performance on joint hand and articulated object poses estimation over the existing state-of-the-art methods. The project is available at https://zehaozhu.github.io/ContactArt/ .
翻译:摘要:我们提出了一种新数据集与新颖方法,用于学习手-物体交互先验以进行手部及关节物体姿态估计。首先,我们通过视觉遥操作技术采集数据集:操作员可直接在物理仿真器中操控关节物体,记录数据并从仿真器中获取物体姿态及接触信息的自由精确标注。该系统仅需一部iPhone记录人手运动,易于规模化扩展并大幅降低数据与标注采集成本。基于此数据,我们学习三维交互先验,包括捕获物体部件分布模式的GAN判别器,以及生成关节物体接触区域并引导手部姿态估计的扩散模型。此类结构与接触先验可几乎无领域差距地迁移至真实数据。通过使用我们的数据与学习先验,该方法在联合手部与关节物体姿态估计任务上显著超越现有最优方法。项目地址:https://zehaozhu.github.io/ContactArt/