Integrating deep learning and causal discovery has increased the interpretability of Temporal Action Segmentation (TAS) tasks. However, frame-level causal relationships exist many complicated noises outside the segment-level, making it infeasible to directly express macro action semantics. Thus, we propose Causal Abstraction Segmentation Refiner (CASR), which can refine TAS results from various models by enhancing video causality in marginalizing frame-level casual relationships. Specifically, we define the equivalent frame-level casual model and segment-level causal model, so that the causal adjacency matrix constructed from marginalized frame-level causal relationships has the ability to represent the segmnet-level causal relationships. CASR works out by reducing the difference in the causal adjacency matrix between we constructed and pre-segmentation results of backbone models. In addition, we propose a novel evaluation metric Causal Edit Distance (CED) to evaluate the causal interpretability. Extensive experimental results on mainstream datasets indicate that CASR significantly surpasses existing various methods in action segmentation performance, as well as in causal explainability and generalization.
翻译:将深度学习和因果发现相结合,提高了时序动作分割(TAS)任务的可解释性。然而,帧级因果关系在片段级之外存在大量复杂噪声,难以直接表达宏观动作语义。为此,我们提出因果抽象分割精炼器(CASR),该模型通过增强视频因果性来边缘化帧级因果关系,从而精炼来自各类模型的TAS结果。具体而言,我们定义了等价的帧级因果模型和片段级因果模型,使得由边缘化帧级因果关系构建的因果邻接矩阵能够表征片段级因果关系。CASR通过减小我们构建的因果邻接矩阵与骨干网络预分割结果之间的差异来发挥作用。此外,我们提出了一种新的评估指标——因果编辑距离(CED),用于评估因果可解释性。在主流数据集上的大量实验结果表明,CASR在动作分割性能、因果可解释性和泛化能力方面显著超越了现有各类方法。