Conventional Voice Assistants (VAs) rely on traditional language models to discern user intent and respond to their queries, leading to interactions that often lack a broader contextual understanding, an area in which Large Language Models (LLMs) excel. However, current LLMs are largely designed for text-based interactions, thus making it unclear how user interactions will evolve if their modality is changed to voice. In this work, we investigate whether LLMs can enrich VA interactions via an exploratory study with participants (N=20) using a ChatGPT-powered VA for three scenarios (medical self-diagnosis, creative planning, and debate) with varied constraints, stakes, and objectivity. We observe that LLM-powered VA elicits richer interaction patterns that vary across tasks, showing its versatility. Notably, LLMs absorb the majority of VA intent recognition failures. We additionally discuss the potential of harnessing LLMs for more resilient and fluid user-VA interactions and provide design guidelines for tailoring LLMs for voice assistance.
翻译:传统语音助手依赖传统语言模型来识别用户意图并回应其查询,这种交互往往缺乏更广泛的语境理解能力,而大语言模型(LLM)恰好在这些方面表现卓越。然而,当前的大语言模型主要面向文本交互设计,因此尚不明确当交互模态转换为语音时,用户交互模式将如何演变。本研究通过一项探索性实验,邀请20名参与者使用基于ChatGPT的语音助手完成三种场景(医疗自我诊断、创意规划、辩论),在约束条件、风险程度与客观性各异的任务中考察LLM能否丰富语音助手的交互体验。研究发现,基于LLM的语音助手能够激发因任务而异的更丰富的交互模式,展现出其多场景适应性。值得注意的是,LLM能够吸收绝大多数语音助手意图识别失败的情况。我们进一步探讨了利用LLM实现更具弹性与流畅性的人机语音交互的潜力,并针对语音助手场景下的LLM定制化设计提出了指导原则。