Skill assessment in procedural videos is crucial for the objective evaluation of human performance in settings such as manufacturing and procedural daily tasks. Current research on skill assessment has predominantly focused on sports and lacks large-scale datasets for complex procedural activities. Existing studies typically involve only a limited number of actions, focus on either pairwise assessments (e.g., A is better than B) or on binary labels (e.g., good execution vs needs improvement). In response to these shortcomings, we introduce ProSkill, the first benchmark dataset for action-level skill assessment in procedural tasks. ProSkill provides absolute skill assessment annotations, along with pairwise ones. This is enabled by a novel and scalable annotation protocol that allows for the creation of an absolute skill assessment ranking starting from pairwise assessments. This protocol leverages a Swiss Tournament scheme for efficient pairwise comparisons, which are then aggregated into consistent, continuous global scores using an ELO-based rating system. We use our dataset to benchmark the main state-of-the-art skill assessment algorithms, including both ranking-based and pairwise paradigms. The suboptimal results achieved by the current state-of-the-art highlight the challenges and thus the value of ProSkill in the context of skill assessment for procedural videos. All data and code are available at https://fpv-iplab.github.io/ProSkill/
翻译:流程视频中的技能评估对于客观评估人类在制造和日常流程任务等场景中的表现至关重要。当前关于技能评估的研究主要集中在体育领域,缺乏针对复杂流程活动的大规模数据集。现有研究通常仅涉及有限数量的动作,并侧重于成对评估(例如,A优于B)或二元标签(例如,执行良好 vs 需要改进)。针对这些不足,我们提出了ProSkill,这是首个用于流程任务中动作级技能评估的基准数据集。ProSkill提供了绝对技能评估标注以及成对标注。这得益于一种新颖且可扩展的标注协议,该协议允许从成对评估出发创建绝对技能评估排名。该协议利用瑞士制锦标赛方案进行高效的成对比较,然后通过基于ELO的评分系统将这些比较聚合成一致、连续的全局分数。我们使用我们的数据集对主要的先进技能评估算法进行基准测试,包括基于排名的范式和成对范式。当前最先进算法所取得的次优结果突显了挑战,从而也证明了ProSkill在流程视频技能评估背景下的价值。所有数据和代码均可在 https://fpv-iplab.github.io/ProSkill/ 获取。