Understanding human visual processing in dynamic environments is essential for psychology and human-centered interaction design. Mobile eye-tracking systems, combining egocentric video and gaze signals, offer valuable insights. However, manual analysis of these recordings is time-intensive. In this work, we present a novel human-centered learning algorithm designed for automated object recognition within mobile eye-tracking settings. Our approach seamlessly integrates an object detector with an inductive message-passing network technique (I-MPN), harnessing node features such as node profile information and positions. This integration enables our algorithm to learn embedding functions capable of generalizing to new object angle views, thereby facilitating rapid adaptation and efficient reasoning in dynamic contexts as users navigate through their environment. Through experiments conducted on three distinct video sequences, our \textit{interactive-based method} showcases significant performance improvements over fixed training/testing algorithms, even when trained on considerably smaller annotated samples collected through user feedback. Furthermore, we showcase exceptional efficiency in data annotation processes, surpassing approaches that use complete object detectors, combine detectors with convolutional networks, or employ interactive video segmentation.
翻译:理解动态环境中的人类视觉处理机制对于心理学及以人为中心的交互设计至关重要。结合第一人称视频与注视信号的移动眼动追踪系统为此提供了宝贵的研究途径。然而,对此类记录的手动分析耗时费力。本研究提出一种新颖的以人为中心的学习算法,专为移动眼动追踪场景下的自动化物体识别而设计。该方法将物体检测器与归纳式消息传递网络技术(I-MPN)无缝集成,充分利用节点轮廓信息、位置坐标等节点特征。这种集成使算法能够学习可泛化至新物体视角的嵌入函数,从而在用户与环境交互的动态场景中实现快速适应与高效推理。通过在三个独立视频序列上的实验验证,我们的"交互式方法"即使在基于用户反馈收集的少量标注样本上进行训练,仍展现出相较于固定训练/测试算法的显著性能提升。此外,我们在数据标注流程中展现出卓越的效率优势,其表现超越了使用完整物体检测器、检测器与卷积网络结合或采用交互式视频分割的现有方法。