This paper addresses a novel task of anticipating 3D human-object interactions (HOIs). Most existing research on HOI synthesis lacks comprehensive whole-body interactions with dynamic objects, e.g., often limited to manipulating small or static objects. Our task is significantly more challenging, as it requires modeling dynamic objects with various shapes, capturing whole-body motion, and ensuring physically valid interactions. To this end, we propose InterDiff, a framework comprising two key steps: (i) interaction diffusion, where we leverage a diffusion model to encode the distribution of future human-object interactions; (ii) interaction correction, where we introduce a physics-informed predictor to correct denoised HOIs in a diffusion step. Our key insight is to inject prior knowledge that the interactions under reference with respect to contact points follow a simple pattern and are easily predictable. Experiments on multiple human-object interaction datasets demonstrate the effectiveness of our method for this task, capable of producing realistic, vivid, and remarkably long-term 3D HOI predictions.
翻译:本文提出了一项全新任务:预测三维人-物交互(HOIs)。现有大多数HOI合成研究缺乏与动态物体的全身交互,例如通常局限于操控小型或静态物体。我们的任务极具挑战性,因为它需要对不同形状的动态物体进行建模、捕捉全身运动,并确保物理有效交互。为此,我们提出InterDiff框架,包含两个关键步骤:(i)交互扩散,利用扩散模型编码未来人-物交互的分布;(ii)交互校正,引入物理信息预测器在扩散步骤中校正去噪后的HOI。我们的核心见解在于注入先验知识:参考接触点的交互遵循简单模式且易于预测。在多个HOI数据集上的实验表明,我们的方法能有效完成该任务,生成逼真、生动且具有显著长期性的三维HOI预测。