Despite huge gains in performance in natural language understanding via large language models in recent years, voice assistants still often fail to meet user expectations. In this study, we conducted a mixed-methods analysis of how voice assistant failures affect users' trust in their voice assistants. To illustrate how users have experienced these failures, we contribute a crowdsourced dataset of 199 voice assistant failures, categorized across 12 failure sources. Relying on interview and survey data, we find that certain failures, such as those due to overcapturing users' input, derail user trust more than others. We additionally examine how failures impact users' willingness to rely on voice assistants for future tasks. Users often stop using their voice assistants for specific tasks that result in failures for a short period of time before resuming similar usage. We demonstrate the importance of low stakes tasks, such as playing music, towards building trust after failures.
翻译:尽管近年来通过大型语言模型在自然语言理解方面取得了巨大进展,但语音助手仍然常常无法满足用户的期望。本研究采用混合方法分析了语音助手失败如何影响用户对其的信任。为了说明用户如何经历这些失败,我们贡献了一个包含199个语音助手失败的众包数据集,这些失败被归类为12种失败来源。基于访谈和调查数据,我们发现某些失败(例如因过度捕捉用户输入导致的失败)比其他失败更能动摇用户的信任。我们还考察了失败如何影响用户在未来任务中对语音助手的依赖意愿。用户往往会在短时间内停止使用导致特定失败的语音助手任务,之后才恢复类似的使用行为。我们证明了低风险任务(如播放音乐)在失败后重建信任方面的重要性。