Advanced multimodal AI agents can now collaborate with users to solve challenges in the world. Yet, these emerging contextual AI systems rely on explicit communication channels between the user and system. We hypothesize that implicit communication of the user's interests and intent would reduce friction and improve user experience in contextual AI. In this work, we explore the potential of wearable eye tracking to convey user attention to the agents. We measure the eye tracking signal quality requirements to effectively map gaze traces to physical objects, then conduct experiments to provide visual scanpath history as additional context when querying multimodal agents. Our results show that eye tracking provides high value as a user attention signal and can convey information about the user's current task and interests to the agent.
翻译:先进的多模态人工智能代理现已能够与用户协作解决现实世界中的挑战。然而,这些新兴的情境人工智能系统依赖于用户与系统之间的显式通信通道。我们假设,对用户兴趣与意图的隐式通信将能减少摩擦并提升情境人工智能中的用户体验。在本研究中,我们探索了可穿戴眼动追踪技术向智能体传递用户注意力的潜力。我们测量了将注视轨迹有效映射到物理对象所需的眼动信号质量要求,随后通过实验在查询多模态智能体时提供视觉扫描路径历史作为额外上下文。我们的结果表明,眼动追踪作为用户注意力信号具有高价值,能够向智能体传递关于用户当前任务与兴趣的信息。