Humans naturally change their environment through interactions, e.g., by opening doors or moving furniture. To reproduce such interactions in virtual spaces (e.g., metaverse), we need to capture and model them, including changes in the scene geometry, ideally from egocentric input alone (head camera and body-worn inertial sensors). While the head camera can be used to localize the person in the scene, estimating dynamic object pose is much more challenging. As the object is often not visible from the head camera (e.g., a human not looking at a chair while sitting down), we can not rely on visual object pose estimation. Instead, our key observation is that human motion tells us a lot about scene changes. Motivated by this, we present iReplica, the first human-object interaction reasoning method which can track objects and scene changes based solely on human motion. iReplica is an essential first step towards advanced AR/VR applications in immersive virtual universes and can provide human-centric training data to teach machines to interact with their surroundings. Our code, data and model will be available on our project page at http://virtualhumans.mpi-inf.mpg.de/ireplica/
翻译:人类通过交互自然地改变环境,例如开门或移动家具。为了在虚拟空间(如元宇宙)中复现此类交互,我们需要捕获并建模这些过程(包括场景几何结构的变化),理想情况下仅依靠第一人称视角输入(头戴相机和佩戴于身体上的惯性传感器)。虽然头戴相机可用于定位环境中的人物,但估计动态物体的姿态更具挑战性。由于物体通常不在头戴相机的视野内(例如,人坐下时不会注视椅子),我们无法依赖视觉物体姿态估计。相反,我们的核心发现是:人体运动本身即可揭示大量场景变化信息。受此启发,我们提出iReplica——首个仅基于人体运动即可追踪物体与场景变化的人-物交互推理方法。iReplica是向沉浸式虚拟宇宙中高级AR/VR应用迈出的关键第一步,并能提供以人为中心的训练数据,用于教导机器与环境交互。我们的代码、数据及模型将在项目页面 http://virtualhumans.mpi-inf.mpg.de/ireplica/ 上公开。