Immersive virtual reality (VR) applications demand accurate, temporally coherent full-body pose tracking. Recent head-mounted camera-based approaches show promise in egocentric pose estimation, but encounter challenges when applied to VR head-mounted displays (HMDs), including temporal instability, inaccurate lower-body estimation, and the lack of real-time performance. To address these limitations, we present EgoPoseVR, an end-to-end framework for accurate egocentric full-body pose estimation in VR that integrates headset motion cues with egocentric RGB-D observations through a dual-modality fusion pipeline. A spatiotemporal encoder extracts frame- and joint-level representations, which are fused via cross-attention to fully exploit complementary motion cues across modalities. A kinematic optimization module then imposes constraints from HMD signals, enhancing the accuracy and stability of pose estimation. To facilitate training and evaluation, we introduce a large-scale synthetic dataset of over 1.8 million temporally aligned HMD and RGB-D frames across diverse VR scenarios. Experimental results show that EgoPoseVR outperforms state-of-the-art egocentric pose estimation models. A user study in real-world scenes further shows that EgoPoseVR achieved significantly higher subjective ratings in accuracy, stability, embodiment, and intention for future use compared to baseline methods. These results show that EgoPoseVR enables robust full-body pose tracking, offering a practical solution for accurate VR embodiment without requiring additional body-worn sensors or room-scale tracking systems.
翻译:沉浸式虚拟现实(VR)应用需要精确且时间连贯的全身姿态跟踪。近期基于头戴式相机的方法在以自我为中心的姿态估计方面展现出潜力,但在应用于VR头戴式显示器(HMD)时面临挑战,包括时间不稳定性、下半身估计不准确以及缺乏实时性能。为应对这些局限,我们提出了EgoPoseVR,这是一个用于VR中精确的以自我为中心全身姿态估计的端到端框架,它通过双模态融合流程将头戴设备运动线索与以自我为中心的RGB-D观测数据相结合。一个时空编码器提取帧级和关节级表征,并通过交叉注意力进行融合,以充分利用跨模态的互补运动线索。随后,一个运动学优化模块施加来自HMD信号的约束,从而提升姿态估计的准确性和稳定性。为促进训练和评估,我们引入了一个大规模合成数据集,包含超过180万帧跨多种VR场景的时间对齐的HMD与RGB-D帧。实验结果表明,EgoPoseVR优于当前最先进的以自我为中心姿态估计模型。在真实场景中的用户研究进一步表明,与基线方法相比,EgoPoseVR在准确性、稳定性、具身感以及未来使用意愿方面获得了显著更高的主观评分。这些结果表明,EgoPoseVR能够实现鲁棒的全身姿态跟踪,为无需额外身体佩戴传感器或房间尺度跟踪系统的精确VR具身化提供了一种实用解决方案。