The advent of immersive Virtual Reality applications has transformed various domains, yet their integration with advanced artificial intelligence technologies like Visual Language Models remains underexplored. This study introduces a pioneering approach utilizing VLMs within VR environments to enhance user interaction and task efficiency. Leveraging the Unity engine and a custom-developed VLM, our system facilitates real-time, intuitive user interactions through natural language processing, without relying on visual text instructions. The incorporation of speech-to-text and text-to-speech technologies allows for seamless communication between the user and the VLM, enabling the system to guide users through complex tasks effectively. Preliminary experimental results indicate that utilizing VLMs not only reduces task completion times but also improves user comfort and task engagement compared to traditional VR interaction methods.
翻译:沉浸式虚拟现实应用的出现已改变了多个领域,然而其与视觉语言模型等先进人工智能技术的融合仍探索不足。本研究提出了一种开创性方法,通过在虚拟现实环境中利用视觉语言模型来增强用户交互与任务效率。基于Unity引擎及自研视觉语言模型,本系统无需依赖视觉文本指令,即可通过自然语言处理实现实时、直观的用户交互。语音转文本与文本转语音技术的集成,使得用户与视觉语言模型之间的通信无缝衔接,从而引导用户高效完成复杂任务。初步实验结果表明:与传统虚拟现实交互方法相比,采用视觉语言模型不仅能缩短任务完成时间,还能提升用户舒适度与任务参与度。