Action quality assessment (AQA) applies computer vision to quantitatively assess the performance or execution of a human action. Current AQA approaches are end-to-end neural models, which lack transparency and tend to be biased because they are trained on subjective human judgements as ground-truth. To address these issues, we introduce a neuro-symbolic paradigm for AQA, which uses neural networks to abstract interpretable symbols from video data and makes quality assessments by applying rules to those symbols. We take diving as the case study. We found that domain experts prefer our system and find it more informative than purely neural approaches to AQA in diving. Our system also achieves state-of-the-art action recognition and temporal segmentation, and automatically generates a detailed report that breaks the dive down into its elements and provides objective scoring with visual evidence. As verified by a group of domain experts, this report may be used to assist judges in scoring, help train judges, and provide feedback to divers. We will open-source all of our annotated training data and code for ease of reproducibility.
翻译:动作质量评估(AQA)通过计算机视觉技术对人类动作的表现或执行效果进行定量评估。当前AQA方法采用端到端神经网络模型,这类模型缺乏可解释性,且因基于主观人为判断作为真值进行训练而易于产生偏差。为解决上述问题,我们提出一种用于AQA的神经符号范式,该方法通过神经网络从视频数据中提取可解释的符号,并通过对这些符号应用规则来完成质量评估。我们以跳水运动为案例开展研究。领域专家反馈表明,相比纯神经网络的跳水AQA方法,他们更青睐我们的系统,并认为其具有更强的信息表达能力。该系统在动作识别与时间分割任务上均达到当前最优性能,可自动生成包含动作要素分解、客观评分及可视化证据的详细评测报告。经领域专家团队验证,该报告可辅助裁判评分、用于裁判培训,并为运动员提供改进反馈。为保障研究可复现性,我们将开源全部带标注的训练数据与代码。