Recent developments in computer graphics, machine learning, and sensor technologies enable numerous opportunities for extended reality (XR) setups for everyday life, from skills training to entertainment. With large corporations offering consumer-grade head-mounted displays (HMDs) in an affordable way, it is likely that XR will become pervasive, and HMDs will develop as personal devices like smartphones and tablets. However, having intelligent spaces and naturalistic interactions in XR is as important as technological advances so that users grow their engagement in virtual and augmented spaces. To this end, large language model (LLM)--powered non-player characters (NPCs) with speech-to-text (STT) and text-to-speech (TTS) models bring significant advantages over conventional or pre-scripted NPCs for facilitating more natural conversational user interfaces (CUIs) in XR. In this paper, we provide the community with an open-source, customizable, extensible, and privacy-aware Unity package, CUIfy, that facilitates speech-based NPC-user interaction with various LLMs, STT, and TTS models. Our package also supports multiple LLM-powered NPCs per environment and minimizes the latency between different computational models through streaming to achieve usable interactions between users and NPCs. We publish our source code in the following repository: https://gitlab.lrz.de/hctl/cuify
翻译:近年来,计算机图形学、机器学习与传感器技术的进步为扩展现实(XR)在日常生活中的应用(从技能培训到娱乐)创造了众多机遇。随着大型企业以可承受的价格推出消费级头戴式显示器(HMD),XR技术有望实现普及,HMD将发展成为类似智能手机和平板电脑的个人设备。然而,要在XR中实现智能空间和自然交互,与技术发展同等重要,这有助于提升用户在虚拟和增强空间中的参与度。为此,基于大语言模型(LLM)并集成语音转文本(STT)与文本转语音(TTS)模型的非玩家角色(NPC),相较于传统或预设脚本的NPC,在促进XR中更自然的对话式用户界面(CUI)方面具有显著优势。本文向社区提供一款开源、可定制、可扩展且注重隐私的Unity工具包——CUIfy,该工具包支持基于语音的NPC-用户交互,兼容多种LLM、STT与TTS模型。我们的工具包还支持每个环境中部署多个基于LLM的NPC,并通过流式处理最小化不同计算模型间的延迟,以实现用户与NPC之间可用的交互体验。源代码发布于以下仓库:https://gitlab.lrz.de/hctl/cuify