Conventional Voice Assistants (VAs) rely on traditional language models to discern user intent and respond to their queries, leading to interactions that often lack a broader contextual understanding, an area in which Large Language Models (LLMs) excel. However, current LLMs are largely designed for text-based interactions, thus making it unclear how user interactions will evolve if their modality is changed to voice. In this work, we investigate whether LLMs can enrich VA interactions via an exploratory study with participants (N=20) using a ChatGPT-powered VA for three scenarios (medical self-diagnosis, creative planning, and discussion) with varied constraints, stakes, and objectivity. We observe that LLM-powered VA elicits richer interaction patterns that vary across tasks, showing its versatility. Notably, LLMs absorb the majority of VA intent recognition failures. We additionally discuss the potential of harnessing LLMs for more resilient and fluid user-VA interactions and provide design guidelines for tailoring LLMs for voice assistance.
翻译:传统语音助手依赖传统语言模型来识别用户意图并回应查询,导致交互往往缺乏更广泛的上下文理解,而这正是大语言模型所擅长的领域。然而,当前的大语言模型主要设计用于基于文本的交互,因此当交互模态转变为语音时,用户交互将如何演变尚不明确。本研究通过一项探索性实验(参与者N=20)调查大语言模型是否能丰富语音助手交互,参与者使用基于ChatGPT的语音助手完成三种不同约束条件、风险水平和客观性的场景(医疗自诊、创意规划与讨论)。我们观察到,大语言模型驱动的语音助手能激发更丰富的交互模式,且这些模式随任务类型变化,体现了其多功能性。值得注意的是,大语言模型吸收了语音助手的大部分意图识别失败案例。我们进一步探讨了利用大语言模型实现更具韧性和流畅性的用户-语音助手交互的潜力,并为定制语音辅助的大语言模型提供了设计指南。