Voice assistants have sharply risen in popularity in recent years, but their use has been limited mostly to simple applications like music, hands-free search, or control of internet-of-things devices. What would it take for voice assistants to guide people through more complex tasks? In our work, we study the limitations of the dominant approach voice assistants take to complex task guidance: reading aloud written instructions. Using recipes as an example, we observe twelve participants cook at home with a state-of-the-art voice assistant. We learn that the current approach leads to nine challenges, including obscuring the bigger picture, overwhelming users with too much information, and failing to communicate affordances. Instructions delivered by a voice assistant are especially difficult because they cannot be skimmed as easily as written instructions. Alexa in particular did not surface crucial details to the user or answer questions well. We draw on our observations to propose eight ways in which voice assistants can ``rewrite the script'' -- summarizing, signposting, splitting, elaborating, volunteering, reordering, redistributing, and visualizing -- to transform written sources into forms that are readily communicated through spoken conversation. We conclude with a vision of how modern advancements in natural language processing can be leveraged for intelligent agents to guide users effectively through complex tasks.
翻译:近年来,语音助手普及度显著提升,但其应用主要局限于音乐播放、免提搜索或物联网设备控制等简单场景。要让语音助手引导用户完成更复杂的任务,还需克服哪些挑战?在本研究中,我们聚焦于语音助手执行复杂任务引导的主流方法——朗读书面指令——的局限性。以菜谱为例,我们观察了12名参与者借助最新语音助手在家烹饪的过程。研究发现,当前方法存在九大挑战,包括遮蔽全局视图、信息过载令用户无所适从、未能传达操作可行性等。语音助手传递的指令尤其难以理解,因其无法像书面文字那样快速浏览。Alexa系统在呈现关键细节和解答用户疑问方面表现尤为不足。基于观察,我们提出八种让语音助手"改写剧本"的策略——总结、指引、拆分、细化、主动提供、重排、重构与可视化——将书面素材转化为更适合口语对话的交互形式。最后,我们展望了如何利用自然语言处理的现代进展,使智能代理能高效引导用户完成复杂任务。