Developing chatbots as personal companions has long been a goal of artificial intelligence researchers. Recent advances in Large Language Models (LLMs) have delivered a practical solution for endowing chatbots with anthropomorphic language capabilities. However, it takes more than LLMs to enable chatbots that can act as companions. Humans use their understanding of individual personalities to drive conversations. Chatbots also require this capability to enable human-like companionship. They should act based on personalized, real-time, and time-evolving knowledge of their owner. We define such essential knowledge as the \textit{common ground} between chatbots and their owners, and we propose to build a common-ground-aware dialogue system from an LLM-based module, named \textit{OS-1}, to enable chatbot companionship. Hosted by eyewear, OS-1 can sense the visual and audio signals the user receives and extract real-time contextual semantics. Those semantics are categorized and recorded to formulate historical contexts from which the user's profile is distilled and evolves over time, i.e., OS-1 gradually learns about its user. OS-1 combines knowledge from real-time semantics, historical contexts, and user-specific profiles to produce a common-ground-aware prompt input into the LLM module. The LLM's output is converted to audio, spoken to the wearer when appropriate.We conduct laboratory and in-field studies to assess OS-1's ability to build common ground between the chatbot and its user. The technical feasibility and capabilities of the system are also evaluated. OS-1, with its common-ground awareness, can significantly improve user satisfaction and potentially lead to downstream tasks such as personal emotional support and assistance.
翻译:长期以来,开发作为个人伴侣的聊天机器人一直是人工智能研究者的目标。大语言模型的近期进展为实现聊天机器人的拟人化语言能力提供了实用方案。然而,要让聊天机器人真正扮演伴侣角色,仅有大语言模型还不够。人类利用对个体人格的理解来引导对话,聊天机器人也需要具备这种能力以实现类人伴侣功能——它们应根据对所有者个性化、实时且随时间演进的知识来行动。我们将这类关键知识定义为聊天机器人与所有者之间的"共同基础",并提出从基于大语言模型的模块(命名为OS-1)构建具备共同基础感知的对话系统,以实现聊天机器人的伴侣功能。OS-1搭载于眼佩戴设备上,可感知用户接收的视觉与音频信号,并提取实时上下文语义。这些语义经分类记录后形成历史语境,从中提炼的用户画像会随时间动态演化——即OS-1逐步学习了解其用户。OS-1融合实时语义、历史语境和用户专属画像的知识,生成包含共同基础感知的提示输入至大语言模型模块,其输出经音频转化后在适当时机向佩戴者播报。我们通过实验室研究与实地研究评估OS-1在聊天机器人与用户之间建立共同基础的能力,同时对其技术可行性与性能进行验证。具有共同基础感知能力的OS-1能显著提升用户满意度,并可能推动如个人情感支持与辅助等下游任务的发展。