Full-body avatar presence is crucial for immersive social and environmental interactions in digital reality. However, current devices only provide three six degrees of freedom (DOF) poses from the headset and two controllers (i.e. three-point trackers). Because it is a highly under-constrained problem, inferring full-body pose from these inputs is challenging, especially when supporting the full range of body proportions and use cases represented by the general population. In this paper, we propose a deep learning framework, DivaTrack, which outperforms existing methods when applied to diverse body sizes and activities. We augment the sparse three-point inputs with linear accelerations from Inertial Measurement Units (IMU) to improve foot contact prediction. We then condition the otherwise ambiguous lower-body pose with the predictions of foot contact and upper-body pose in a two-stage model. We further stabilize the inferred full-body pose in a wide range of configurations by learning to blend predictions that are computed in two reference frames, each of which is designed for different types of motions. We demonstrate the effectiveness of our design on a large dataset that captures 22 subjects performing challenging locomotion for three-point tracking, including lunges, hula-hooping, and sitting. As shown in a live demo using the Meta VR headset and Xsens IMUs, our method runs in real-time while accurately tracking a user's motion when they perform a diverse set of movements.
翻译:摘要:全身虚拟化身的存在对于数字现实中的沉浸式社交与环境交互至关重要。然而,当前设备仅能从头显和两个控制器(即三点追踪器)提供六个自由度(DOF)的姿态数据。由于这是一个高度欠约束的问题,从这些输入推断全身姿态极具挑战性,尤其是在支持普通人群所呈现的全范围身体比例与使用场景时。本文提出深度学习框架DivaTrack,在应用于不同体型与活动时均优于现有方法。我们使用惯性测量单元(IMU)的线性加速度增强稀疏的三点输入,以改进脚部接触预测。随后,通过两阶段模型,利用脚部接触与上半身姿态的预测结果,对原本模糊的下半身姿态进行条件约束。进一步地,通过学习在两个参考坐标系下计算的预测结果的混合(每个参考坐标系针对不同类型的运动设计),我们稳定了在广泛配置下推断的全身姿态。我们在一个大型数据集上验证了该设计的有效性,该数据集记录了22名受试者执行三点追踪中具有挑战性的运动(包括弓步、呼啦圈旋转和坐姿)的情况。通过Meta VR头显与Xsens IMU的实时演示,我们的方法在实时运行的同时,能够准确追踪用户执行多样化动作时的运动。