In recent years, there has been growing interest in the video-based action quality assessment (AQA). Most existing methods typically solve AQA problem by considering the entire video yet overlooking the inherent stage-level characteristics of actions. To address this issue, we design a novel Multi-stage Contrastive Regression (MCoRe) framework for the AQA task. This approach allows us to efficiently extract spatial-temporal information, while simultaneously reducing computational costs by segmenting the input video into multiple stages or procedures. Inspired by the graph contrastive learning, we propose a new stage-wise contrastive learning loss function to enhance performance. As a result, MCoRe demonstrates the state-of-the-art result so far on the widely-adopted fine-grained AQA dataset.
翻译:近年来,基于视频的动作质量评估(AQA)引起了越来越多的关注。现有方法通常通过分析整个视频来解决AQA问题,却忽略了动作固有的阶段级特征。为解决这一问题,我们针对AQA任务设计了一种新颖的多阶段对比回归(MCoRe)框架。该方法通过将输入视频分割为多个阶段或过程,能够高效提取时空信息,同时降低计算成本。受图对比学习的启发,我们提出了一种新的阶段级对比学习损失函数以提升性能。最终,MCoRe在广泛采用的细粒度AQA数据集上取得了当前最优结果。