Humans naturally change their environment through interactions, e.g., by opening doors or moving furniture. To reproduce such interactions in virtual spaces (e.g., metaverse), we need to capture and model them, including changes in the scene geometry, ideally from egocentric input alone (head camera and body-worn inertial sensors). While the head camera can be used to localize the person in the scene, estimating dynamic object pose is much more challenging. As the object is often not visible from the head camera (e.g., a human not looking at a chair while sitting down), we can not rely on visual object pose estimation. Instead, our key observation is that human motion tells us a lot about scene changes. Motivated by this, we present iReplica, the first human-object interaction reasoning method which can track objects and scene changes based solely on human motion. iReplica is an essential first step towards advanced AR/VR applications in immersive virtual universes and can provide human-centric training data to teach machines to interact with their surroundings. Our code, data and model will be available on our project page at http://virtualhumans.mpi-inf.mpg.de/ireplica/
翻译:人类在互动中自然而然地改变环境,例如开门或移动家具。为了在虚拟空间(如元宇宙)中复现此类交互,我们需要捕捉并建模这些交互过程,包括场景几何结构的变化,且理想情况下仅依赖自我中心输入(头戴摄像头与身体佩戴的惯性传感器)。尽管头戴摄像头可用于定位人在场景中的位置,但动态物体姿态的估算仍极具挑战性。由于物体通常不在头戴摄像头的视野范围内(例如人坐下时未注视椅子),我们无法依赖视觉物体姿态估计。相反,我们的关键发现是:人体运动本身能揭示大量场景变化信息。基于此,我们提出iReplica——首个仅凭人体运动即可追踪物体与场景变化的人-物交互推理方法。iReplica是迈向沉浸式虚拟宇宙中高级AR/VR应用的重要第一步,并能提供以人为中心的训练数据,用于教导机器与环境交互。我们的代码、数据与模型将发布在项目主页:http://virtualhumans.mpi-inf.mpg.de/ireplica/