EgoPoseVR：面向虚拟现实中以自我为中心全身姿态的时空多模态推理 (EgoPoseVR: Spatiotemporal Multi-Modal Reasoning for Egocentric Full-Body Pose in Virtual Reality)

Immersive virtual reality (VR) applications demand accurate, temporally coherent full-body pose tracking. Recent head-mounted camera-based approaches show promise in egocentric pose estimation, but encounter challenges when applied to VR head-mounted displays (HMDs), including temporal instability, inaccurate lower-body estimation, and the lack of real-time performance. To address these limitations, we present EgoPoseVR, an end-to-end framework for accurate egocentric full-body pose estimation in VR that integrates headset motion cues with egocentric RGB-D observations through a dual-modality fusion pipeline. A spatiotemporal encoder extracts frame- and joint-level representations, which are fused via cross-attention to fully exploit complementary motion cues across modalities. A kinematic optimization module then imposes constraints from HMD signals, enhancing the accuracy and stability of pose estimation. To facilitate training and evaluation, we introduce a large-scale synthetic dataset of over 1.8 million temporally aligned HMD and RGB-D frames across diverse VR scenarios. Experimental results show that EgoPoseVR outperforms state-of-the-art egocentric pose estimation models. A user study in real-world scenes further shows that EgoPoseVR achieved significantly higher subjective ratings in accuracy, stability, embodiment, and intention for future use compared to baseline methods. These results show that EgoPoseVR enables robust full-body pose tracking, offering a practical solution for accurate VR embodiment without requiring additional body-worn sensors or room-scale tracking systems.

翻译：沉浸式虚拟现实（VR）应用需要精确且时间连贯的全身姿态跟踪。近期基于头戴式相机的方法在以自我为中心的姿态估计方面展现出潜力，但在应用于VR头戴式显示器（HMD）时面临挑战，包括时间不稳定性、下半身估计不准确以及缺乏实时性能。为应对这些局限，我们提出了EgoPoseVR，这是一个用于VR中精确的以自我为中心全身姿态估计的端到端框架，它通过双模态融合流程将头戴设备运动线索与以自我为中心的RGB-D观测数据相结合。一个时空编码器提取帧级和关节级表征，并通过交叉注意力进行融合，以充分利用跨模态的互补运动线索。随后，一个运动学优化模块施加来自HMD信号的约束，从而提升姿态估计的准确性和稳定性。为促进训练和评估，我们引入了一个大规模合成数据集，包含超过180万帧跨多种VR场景的时间对齐的HMD与RGB-D帧。实验结果表明，EgoPoseVR优于当前最先进的以自我为中心姿态估计模型。在真实场景中的用户研究进一步表明，与基线方法相比，EgoPoseVR在准确性、稳定性、具身感以及未来使用意愿方面获得了显著更高的主观评分。这些结果表明，EgoPoseVR能够实现鲁棒的全身姿态跟踪，为无需额外身体佩戴传感器或房间尺度跟踪系统的精确VR具身化提供了一种实用解决方案。

相关内容

关注 23

IEEE虚拟现实会议一直是展示虚拟现实(VR)广泛领域研究成果的主要国际场所，包括增强现实（AR），混合现实（MR）和3D用户界面中寻求高质量的原创论文。每篇论文应归类为主要涵盖研究，应用程序或系统，并使用以下准则进行分类：研究论文应描述有助于先进软件，硬件，算法，交互或人为因素发展的结果。应用论文应解释作者如何基于现有思想并将其应用到以新颖的方式解决有趣的问题。每篇论文都应包括对给定应用领域中VR/AR/MR使用成功的评估。官网地址：http://dblp.uni-trier.de/db/conf/vr/

具身智能中的心理世界建模：深度综述

专知会员服务

28+阅读 · 1月10日

下一代战术训练：沉浸式VR与AR模拟系统

专知会员服务

15+阅读 · 2025年12月28日

《战术训练虚拟士兵：一种用于自适应军事模拟的生成式人工智能框架》最新文献

专知会员服务

25+阅读 · 2025年9月24日

DeepSeek专题研究：“低成本、高性能、强推理”三位一体，DeepSeek驱动高质量模型平价化

专知会员服务

79+阅读 · 2025年2月14日