Voice assistants (VAs) like Siri and Alexa are transforming human-computer interaction; however, they lack awareness of users' spatiotemporal context, resulting in limited performance and unnatural dialogue. We introduce GazePointAR, a fully-functional context-aware VA for wearable augmented reality that leverages eye gaze, pointing gestures, and conversation history to disambiguate speech queries. With GazePointAR, users can ask "what's over there?" or "how do I solve this math problem?" simply by looking and/or pointing. We evaluated GazePointAR in a three-part lab study (N=12): (1) comparing GazePointAR to two commercial systems; (2) examining GazePointAR's pronoun disambiguation across three tasks; (3) and an open-ended phase where participants could suggest and try their own context-sensitive queries. Participants appreciated the naturalness and human-like nature of pronoun-driven queries, although sometimes pronoun use was counter-intuitive. We then iterated on GazePointAR and conducted a first-person diary study examining how GazePointAR performs in-the-wild. We conclude by enumerating limitations and design considerations for future context-aware VAs.
翻译:语音助手(如Siri和Alexa)正在改变人机交互方式;然而,它们缺乏对用户时空上下文的感知,导致性能受限和对话不自然。我们提出GazePointAR——一种专为可穿戴增强现实设计的功能完备的上下文感知语音助手,它利用眼动注视、指向手势和对话历史来消解语音查询中的指代歧义。借助GazePointAR,用户只需注视和/或指向,即可询问“那里是什么?”或“如何解决这道数学题?”。我们通过三项实验室研究(N=12)对GazePointAR进行评估:(1)将其与两款商用系统进行对比;(2)检验其在三项任务中的代词消解能力;(3)开展开放式环节,参与者可自行提出并尝试上下文相关查询。参与者赞赏代词驱动查询的自然性和类人特性,尽管代词使用有时会违反直觉。随后,我们迭代优化GazePointAR,并通过第一人称日记研究考察其在真实世界场景中的表现。最后,我们列举了未来上下文感知语音助手的局限性及设计考量。