This paper presents a new method to describe spatio-temporal relations between objects and hands, to recognize both interactions and activities within video demonstrations of manual tasks. The approach exploits Scene Graphs to extract key interaction features from image sequences, encoding at the same time motion patterns and context. Additionally, the method introduces an event-based automatic video segmentation and clustering, which allows to group similar events, detecting also on the fly if a monitored activity is executed correctly. The effectiveness of the approach was demonstrated in two multi-subject experiments, showing the ability to recognize and cluster hand-object and object-object interactions without prior knowledge of the activity, as well as matching the same activity performed by different subjects.
翻译:本文提出了一种描述物体与手之间时空关系的新方法,用于识别视频演示中手动任务的交互与活动。该方法利用场景图从图像序列中提取关键交互特征,同时编码运动模式与上下文信息。此外,该技术引入了一种基于事件的自动视频分割与聚类方法,能够对相似事件进行分组,并实时检测被监测活动是否被正确执行。通过两项多受试者实验验证了该方法的有效性,结果表明,该方法能够在无需预先了解活动的情况下,识别并聚类手-物体及物体-物体交互,同时匹配不同受试者执行的同一活动。