This paper presents a new method to describe spatio-temporal relations between objects and hands, to recognize both interactions and activities within video demonstrations of manual tasks. The approach exploits Scene Graphs to extract key interaction features from image sequences while simultaneously encoding motion patterns and context. Additionally, the method introduces event-based automatic video segmentation and clustering, which allow for the grouping of similar events and detect if a monitored activity is executed correctly. The effectiveness of the approach was demonstrated in two multi-subject experiments, showing the ability to recognize and cluster hand-object and object-object interactions without prior knowledge of the activity, as well as matching the same activity performed by different subjects.
翻译:本文提出了一种描述对象与手之间时空关系的新方法,用于识别人工操作任务视频演示中的交互与活动。该方法利用场景图从图像序列中提取关键交互特征,同时编码运动模式与上下文信息。此外,该方法引入基于事件的视频自动分割与聚类技术,能够实现相似事件的分组,并检测被监测活动是否被正确执行。通过两项多被试实验验证了该方法的有效性,结果表明其能够在无需预先了解活动的情况下识别并聚类手-对象及对象-对象交互,同时匹配不同被试执行的相同活动。