Large Multimodal Models (LMMs) have shown strong potential for assisting users in tasks, such as programming, content creation, and information access, yet their interaction remains largely limited to traditional interfaces such as desktops and smartphones. Meanwhile, advances in mixed reality (MR) hardware have enabled applications that extend beyond entertainment and into everyday use. However, most existing MR systems rely primarily on manual input (e.g., hand gestures or controllers) and provide limited intelligent assistance due to the lack of integration with large-scale AI models. We present Reality Copilot, a voice-first human-AI assistant for mixed reality that leverages LMMs to enable natural speech-based interaction. The system supports contextual understanding of physical environments, realistic 3D content generation, and real-time information retrieval. In addition to in-headset interaction, Reality Copilot facilitates cross-platform workflows by generating context-aware textual content and exporting generated assets. This work explores the design space of LMM-powered human-AI collaboration in mixed reality.
翻译:大型多模态模型(LMMs)在辅助用户完成编程、内容创作和信息获取等任务方面展现出巨大潜力,然而其交互方式仍主要局限于桌面端和智能手机等传统界面。与此同时,混合现实(MR)硬件的进步已推动其应用超越娱乐范畴,延伸至日常使用场景。然而,现有大多数MR系统主要依赖手动输入(如手势或控制器),且由于缺乏与大规模AI模型的集成,提供的智能辅助功能有限。本文提出“现实副驾驶”——一种基于大型多模态模型的混合现实语音优先人机助手,通过自然语音交互实现智能协作。该系统支持对物理环境的上下文理解、逼真的三维内容生成以及实时信息检索。除头戴设备内的交互外,现实副驾驶还能通过生成情境感知的文本内容及导出生成资产,实现跨平台工作流协同。本研究探索了混合现实中基于大型多模态模型的人机协作设计空间。