Most existing group activity recognition methods construct spatial-temporal relations merely based on visual representation. Some methods introduce extra knowledge, such as action labels, to build semantic relations and use them to refine the visual presentation. However, the knowledge they explored just stay at the semantic-level, which is insufficient for pursing notable accuracy. In this paper, we propose to exploit knowledge concretization for the group activity recognition, and develop a novel Knowledge Augmented Relation Inference framework that can effectively use the concretized knowledge to improve the individual representations. Specifically, the framework consists of a Visual Representation Module to extract individual appearance features, a Knowledge Augmented Semantic Relation Module explore semantic representations of individual actions, and a Knowledge-Semantic-Visual Interaction Module aims to integrate visual and semantic information by the knowledge. Benefiting from these modules, the proposed framework can utilize knowledge to enhance the relation inference process and the individual representations, thus improving the performance of group activity recognition. Experimental results on two public datasets show that the proposed framework achieves competitive performance compared with state-of-the-art methods.
翻译:现有的大多数群体活动识别方法仅基于视觉表征构建时空关系。部分方法引入额外知识(如动作标签)建立语义关系,并利用这些知识优化视觉表征。然而,这些方法所探索的知识仅停留在语义层面,难以实现显著的精确度提升。本文提出利用知识具体化技术进行群体活动识别,并开发了一种新颖的知识增强关系推理框架,该框架能有效利用具体化知识改善个体表征。具体而言,该框架包含三个模块:用于提取个体外观特征的视觉表征模块、探索个体动作语义表征的知识增强语义关系模块,以及旨在通过知识整合视觉与语义信息的知识-语义-视觉交互模块。借助这些模块,所提框架能够利用知识强化关系推理过程与个体表征,从而提升群体活动识别性能。在两个公开数据集上的实验结果表明,与现有最先进方法相比,所提框架取得了具有竞争力的性能。