Exploring Context-aware and LLM-driven Locomotion for Immersive Virtual Reality

Locomotion plays a crucial role in shaping the user experience within virtual reality environments. In particular, hands-free locomotion offers a valuable alternative by supporting accessibility and freeing users from reliance on handheld controllers. To this end, traditional speech-based methods often depend on rigid command sets, limiting the naturalness and flexibility of interaction. In this study, we propose a novel locomotion technique powered by large language models (LLMs), which allows users to navigate virtual environments using natural language with contextual awareness. We evaluate three locomotion methods: controller-based teleportation, voice-based steering, and our language model-driven approach. Our evaluation combines eye-tracking data analysis, including exploratory explainable machine learning analysis with SHAP, and standardized questionnaires (SUS, IPQ, CSQ-VR, NASA-TLX) to examine user experience through both objective gaze-based measures and subjective self-reports of usability, presence, cybersickness, and cognitive load. Our findings show no statistically significant differences in usability, presence, or cybersickness between LLM-driven locomotion and established methods such as teleportation, suggesting its potential as a viable, natural language-based, hands-free alternative. In addition, eye-tracking analysis revealed patterns suggesting tendency toward increased user attention and engagement in the LLM-driven condition. Complementary to these findings, exploratory SHAP analysis revealed that fixation, saccade, and pupil-related features vary across techniques, indicating distinct patterns of visual attention and cognitive processing. Overall, we state that our method can facilitate hands-free locomotion in virtual spaces, especially in supporting accessibility.

翻译：移动技术在塑造虚拟现实环境中的用户体验方面起着至关重要的作用。特别是，免手持移动通过支持可访问性并让用户摆脱对手持控制器的依赖，提供了一种有价值的替代方案。为此，传统的基于语音的方法通常依赖于僵化的指令集，限制了交互的自然性和灵活性。在本研究中，我们提出了一种由大语言模型（LLMs）驱动的新型移动技术，该技术允许用户使用具有情境感知的自然语言在虚拟环境中导航。我们评估了三种移动方法：基于控制器的瞬移、基于语音的转向以及我们的语言模型驱动方法。我们的评估结合了眼动追踪数据分析（包括使用SHAP进行可解释的探索性机器学习分析）和标准化问卷（SUS、IPQ、CSQ-VR、NASA-TLX），通过基于注视的客观测量以及可用性、临场感、晕动症和认知负荷的主观自我报告来检验用户体验。我们的研究结果表明，在可用性、临场感或晕动症方面，LLM驱动的移动与瞬移等成熟方法之间没有统计学上的显著差异，这表明其作为一种可行的、基于自然语言的免手持替代方案具有潜力。此外，眼动追踪分析揭示的模式表明，在LLM驱动条件下，用户注意力和参与度有增加的趋势。作为这些发现的补充，探索性SHAP分析揭示，注视、扫视和瞳孔相关的特征在不同技术间存在差异，表明了视觉注意力和认知加工的不同模式。总体而言，我们认为我们的方法能够促进虚拟空间中的免手持移动，特别是在支持可访问性方面。