Generating both plausible and accurate full body avatar motion is the key to the quality of immersive experiences in mixed reality scenarios. Head-Mounted Devices (HMDs) typically only provide a few input signals, such as head and hands 6-DoF. Recently, different approaches achieved impressive performance in generating full body motion given only head and hands signal. However, to the best of our knowledge, all existing approaches rely on full hand visibility. While this is the case when, e.g., using motion controllers, a considerable proportion of mixed reality experiences do not involve motion controllers and instead rely on egocentric hand tracking. This introduces the challenge of partial hand visibility owing to the restricted field of view of the HMD. In this paper, we propose the first unified approach, HMD-NeMo, that addresses plausible and accurate full body motion generation even when the hands may be only partially visible. HMD-NeMo is a lightweight neural network that predicts the full body motion in an online and real-time fashion. At the heart of HMD-NeMo is the spatio-temporal encoder with novel temporally adaptable mask tokens that encourage plausible motion in the absence of hand observations. We perform extensive analysis of the impact of different components in HMD-NeMo and introduce a new state-of-the-art on AMASS dataset through our evaluation.
翻译:生成既合理又准确的全身化身运动是混合现实场景中沉浸式体验质量的关键。头戴式设备通常仅提供少量输入信号,例如头部和手部的六自由度信息。近年来,不同方法在仅依靠头部和手部信号生成全身运动方面取得了令人瞩目的成果。然而,据我们所知,所有现有方法均依赖于手部的完全可见性。虽然在使用运动控制器等场景下可实现这一条件,但相当一部分混合现实体验不涉及运动控制器,而是依赖以自我为中心的手部追踪技术。这带来了因头戴式设备视场角受限而导致的局部手部可见性挑战。本文提出了首个统一方法HMD-NeMo,即使手部仅部分可见,也能实现合理且准确的全身运动生成。HMD-NeMo是一种轻量级神经网络,能够以在线实时方式预测全身运动。其核心是结合新型时间自适应掩码标记的时空编码器,在手部观测缺失时仍能激励合理的运动生成。我们对HMD-NeMo中各组件的影响进行了深入分析,并通过评估在AMASS数据集上达到了新的最优性能。