This paper proposes human-in-the-loop adaptation for Group Activity Feature Learning (GAFL) without group activity annotations. This human-in-the-loop adaptation is employed in a group-activity video retrieval framework to improve its retrieval performance. Our method initially pre-trains the GAF space based on the similarity of group activities in a self-supervised manner, unlike prior work that classifies videos into pre-defined group activity classes in a supervised learning manner. Our interactive fine-tuning process updates the GAF space to allow a user to better retrieve videos similar to query videos given by the user. In this fine-tuning, our proposed data-efficient video selection process provides several videos, which are selected from a video database, to the user in order to manually label these videos as positive or negative. These labeled videos are used to update (i.e., fine-tune) the GAF space, so that the positive and negative videos move closer to and farther away from the query videos through contrastive learning. Our comprehensive experimental results on two team sports datasets validate that our method significantly improves the retrieval performance. Ablation studies also demonstrate that several components in our human-in-the-loop adaptation contribute to the improvement of the retrieval performance. Code: https://github.com/chihina/GAFL-FINE-CVIU.
翻译:本文提出了一种无需群体活动标注的人机协同适应方法,用于群体活动特征学习。该人机协同适应机制被应用于群体活动视频检索框架中,以提升其检索性能。与先前以监督学习方式将视频分类至预定义群体活动类别的工作不同,我们的方法首先以自监督方式基于群体活动相似性预训练群体活动特征空间。我们设计的交互式微调过程通过更新群体活动特征空间,使用户能够更有效地检索与所提供查询视频相似的视频。在此微调过程中,我们提出的数据高效视频选择流程从视频数据库中筛选出若干视频呈现给用户,由用户手动将这些视频标注为正例或负例。这些标注视频随后被用于更新(即微调)群体活动特征空间,通过对比学习使正例视频更接近查询视频、负例视频更远离查询视频。我们在两个团队运动数据集上的综合实验结果验证了本方法能显著提升检索性能。消融研究进一步表明,人机协同适应机制中的多个组件对检索性能的提升具有贡献。代码地址:https://github.com/chihina/GAFL-FINE-CVIU。