Recovering detailed interactions between humans/hands and objects is an appealing yet challenging task. Existing methods typically use template-based representations to track human/hand and objects in interactions. Despite the progress, they fail to handle the invisible contact surfaces. In this paper, we propose Ins-HOI, an end-to-end solution to recover human/hand-object reconstruction via instance-level implicit reconstruction. To this end, we introduce an instance-level occupancy field to support simultaneous human/hand and object representation, and a complementary training strategy to handle the lack of instance-level ground truths. Such a representation enables learning a contact prior implicitly from sparse observations. During the complementary training, we augment the real-captured data with synthesized data by randomly composing individual scans of humans/hands and objects and intentionally allowing for penetration. In this way, our network learns to recover individual shapes as completely as possible from the synthesized data, while being aware of the contact constraints and overall reasonability based on real-captured scans. As demonstrated in experiments, our method Ins-HOI can produce reasonable and realistic non-visible contact surfaces even in cases of extremely close interaction. To facilitate the research of this task, we collect a large-scale, high-fidelity 3D scan dataset, including 5.2k high-quality scans with real-world human-chair and hand-object interactions. We will release our dataset and source codes. Data examples and the video results of our method can be found on the project page.
翻译:恢复人手/手部与物体之间的精细交互是一项吸引人但具有挑战性的任务。现有方法通常采用基于模板的表征来追踪交互中的人手/手部与物体。尽管取得了进展,但它们无法处理不可见的接触表面。在本文中,我们提出Ins-HOI,一种通过实例级隐式重建来恢复人手/手部与物体重建的端到端解决方案。为此,我们引入了一个实例级占用场以支持人手/手部与物体的同时表征,并采用一种互补训练策略来处理缺乏实例级真实标注的问题。这种表征使网络能够从稀疏观测中隐式学习接触先验。在互补训练过程中,我们通过随机组合人手/手部与物体的独立扫描并有意允许穿透,将真实采集数据与合成数据进行增强。通过这种方式,我们的网络学会在感知接触约束和整体合理性的同时,尽可能完整地从合成数据中恢复个体形状。实验表明,我们的方法Ins-HOI即使在极其紧密的交互情况下也能生成合理且逼真的不可见接触表面。为促进该任务的研究,我们收集了一个大规模、高保真的3D扫描数据集,包含5200个高质量扫描,涵盖真实世界中的人-椅与手-物交互。我们将公开数据集和源代码。方法的数据示例和视频结果可在项目页面查看。