XR devices running chat-bots powered by Large Language Models (LLMs) have tremendous potential as always-on agents that can enable much better productivity scenarios. However, screen based chat-bots do not take advantage of the the full-suite of natural inputs available in XR, including inward facing sensor data, instead they over-rely on explicit voice or text prompts, sometimes paired with multi-modal data dropped as part of the query. We propose a solution that leverages an attention framework that derives context implicitly from user actions, eye-gaze, and contextual memory within the XR environment. This minimizes the need for engineered explicit prompts, fostering grounded and intuitive interactions that glean user insights for the chat-bot. Our user studies demonstrate the imminent feasibility and transformative potential of our approach to streamline user interaction in XR with chat-bots, while offering insights for the design of future XR-embodied LLM agents.
翻译:运行由大型语言模型(LLM)驱动的聊天机器人的扩展现实设备,作为始终在线的智能代理具有巨大潜力,能够实现更高效的生产力场景。然而,基于屏幕的聊天机器人未能充分利用扩展现实中可用的全套自然输入方式(包括内向传感器数据),而是过度依赖显式的语音或文本提示,有时辅以查询中附带的多模态数据。我们提出一种解决方案,该方案利用注意力框架,从用户在扩展现实环境中的行为、视线追踪及上下文记忆中隐式地推导语境。这最大程度地减少了对人工设计显式提示的需求,促进了基于实际情境的直观交互,从而为聊天机器人收集用户洞察。我们的用户研究表明,该方法在简化用户与扩展现实中聊天机器人的交互方面具有迫切的可行性和变革性潜力,同时为未来扩展现实具身大型语言模型代理的设计提供了见解。