Most existing group activity recognition methods construct spatial-temporal relations merely based on visual representation. Some methods introduce extra knowledge, such as action labels, to build semantic relations and use them to refine the visual presentation. However, the knowledge they explored just stay at the semantic-level, which is insufficient for pursing notable accuracy. In this paper, we propose to exploit knowledge concretization for the group activity recognition, and develop a novel Knowledge Augmented Relation Inference framework that can effectively use the concretized knowledge to improve the individual representations. Specifically, the framework consists of a Visual Representation Module to extract individual appearance features, a Knowledge Augmented Semantic Relation Module explore semantic representations of individual actions, and a Knowledge-Semantic-Visual Interaction Module aims to integrate visual and semantic information by the knowledge. Benefiting from these modules, the proposed framework can utilize knowledge to enhance the relation inference process and the individual representations, thus improving the performance of group activity recognition. Experimental results on two public datasets show that the proposed framework achieves competitive performance compared with state-of-the-art methods.
翻译:现有的大多数群体活动识别方法仅基于视觉表示构建时空关系。部分方法引入额外知识(如动作标签)建立语义关系并用以优化视觉表征,然而其所探索的知识仅停留于语义层面,难以实现显著精度提升。本文提出利用知识具体化实现群体活动识别,并开发了一种新颖的知识增强关系推理框架,可有效运用具体化知识改进个体表征。具体而言,该框架包含:视觉表征模块(提取个体外观特征)、知识增强语义关系模块(探索个体动作的语义表征)、以及知识-语义-视觉交互模块(通过知识整合视觉与语义信息)。得益于这些模块,所提框架能够利用知识增强关系推理过程与个体表征,从而提升群体活动识别性能。在两个公开数据集上的实验结果表明,与最先进方法相比,该框架取得了具有竞争力的性能。