During multi-party interactions, gaze direction is a key indicator of interest and intent, making it essential for social robots to direct their attention appropriately. Understanding the social context is crucial for robots to engage effectively, predict human intentions, and navigate interactions smoothly. This study aims to develop an empirical motion-time pattern for human gaze behavior in various social situations (e.g., entering, leaving, waving, talking, and pointing) using deep neural networks based on participants' data. We created two video clips-one for a computer screen and another for a virtual reality headset-depicting different social scenarios. Data were collected from 30 participants: 15 using an eye-tracker and 15 using an Oculus Quest 1 headset. Deep learning models, specifically Long Short-Term Memory (LSTM) and Transformers, were used to analyze and predict gaze patterns. Our models achieved 60% accuracy in predicting gaze direction in a 2D animation and 65% accuracy in a 3D animation. Then, the best model was implemented onto the Nao robot; and 36 new participants evaluated its performance. The feedback indicated overall satisfaction, with those experienced in robotics rating the models more favorably.
翻译:在多参与者互动过程中,视线方向是兴趣与意图的关键指标,这使得社交机器人必须能够恰当地引导其注意力。理解社交情境对于机器人有效参与互动、预测人类意图以及顺畅完成交互至关重要。本研究旨在基于参与者数据,利用深度神经网络开发适用于多种社交情境(如进入、离开、挥手、交谈和指向)的人类视线行为经验性运动-时间模式。我们制作了两段视频片段——分别用于计算机屏幕和虚拟现实头戴设备——描绘了不同的社交场景。数据采集自30名参与者:其中15名使用眼动仪,15名使用Oculus Quest 1头戴设备。研究采用深度学习模型,特别是长短期记忆网络(LSTM)和Transformer,对视线模式进行分析与预测。我们的模型在二维动画中预测视线方向的准确率达到60%,在三维动画中达到65%。随后,最优模型被部署至Nao机器人上,并由36名新参与者对其表现进行评估。反馈表明总体满意度较高,其中具有机器人技术经验的参与者对模型评价更为积极。