AI-driven Action Quality Assessment (AQA) of sports videos can mimic Olympic judges to help score performances as a second opinion or for training. However, these AI methods are uninterpretable and do not justify their scores, which is important for algorithmic accountability. Indeed, to account for their decisions, instead of scoring subjectively, sports judges use a consistent set of criteria - rubric - on multiple actions in each performance sequence. Therefore, we propose IRIS to perform Interpretable Rubric-Informed Segmentation on action sequences for AQA. We investigated IRIS for scoring videos of figure skating performance. IRIS predicts (1) action segments, (2) technical element score differences of each segment relative to base scores, (3) multiple program component scores, and (4) the summed final score. In a modeling study, we found that IRIS performs better than non-interpretable, state-of-the-art models. In a formative user study, practicing figure skaters agreed with the rubric-informed explanations, found them useful, and trusted AI judgments more. This work highlights the importance of using judgment rubrics to account for AI decisions.
翻译:基于AI的动作质量评估(AQA)可通过模拟奥运裁判为运动视频提供评分辅助(作为第二意见或训练参考)。然而,现有AI方法缺乏可解释性,无法解释评分依据——这对算法问责制至关重要。事实上,运动裁判为避免主观评分,会依据一套统一准则(即评分标准)对每段表演序列中的多个动作进行评判。为此,我们提出IRIS方法,在动作序列上实现可解释的评分标准分割(Interpretable Rubric-Informed Segmentation)用于AQA。我们以花样滑冰表演视频为对象验证IRIS的有效性。IRIS可预测:(1)动作片段;(2)各片段技术要素得分与基础分的差异值;(3)多项节目内容分;(4)最终总分。建模实验表明,IRIS性能优于不可解释的现有最优模型。形成性用户研究中,接受测试的花样滑冰运动员认可基于评分标准的解释体系,认为其具有实用性,并对AI判断产生更高的信任度。本工作凸显了利用评分标准解释AI决策的重要性。