The advent of immersive Virtual Reality applications has transformed various domains, yet their integration with advanced artificial intelligence technologies like Visual Language Models remains underexplored. This study introduces a pioneering approach utilizing VLMs within VR environments to enhance user interaction and task efficiency. Leveraging the Unity engine and a custom-developed VLM, our system facilitates real-time, intuitive user interactions through natural language processing, without relying on visual text instructions. The incorporation of speech-to-text and text-to-speech technologies allows for seamless communication between the user and the VLM, enabling the system to guide users through complex tasks effectively. Preliminary experimental results indicate that utilizing VLMs not only reduces task completion times but also improves user comfort and task engagement compared to traditional VR interaction methods.
翻译:沉浸式虚拟现实应用的出现已变革了众多领域,然而其与视觉语言模型等先进人工智能技术的融合仍待深入探索。本研究提出了一种在VR环境中利用VLM以增强用户交互与任务效率的开创性方法。依托Unity引擎与自主研发的VLM,我们的系统通过自然语言处理实现实时直观的用户交互,且无需依赖视觉文本指令。结合语音转文本与文本转语音技术,该系统支持用户与VLM间的无缝通信,从而有效引导用户完成复杂任务。初步实验结果表明,相较于传统VR交互方式,采用VLM不仅能缩短任务完成时间,还能提升用户舒适度与任务参与感。