We introduce VIBA, a novel approach for explainable video classification by adapting Information Bottlenecks for Attribution (IBA) to video sequences. While most traditional explainability methods are designed for image models, our IBA framework addresses the need for explainability in temporal models used for video analysis. To demonstrate its effectiveness, we apply VIBA to video deepfake detection, testing it on two architectures: the Xception model for spatial features and a VGG11-based model for capturing motion dynamics through optical flow. Using a custom dataset that reflects recent deepfake generation techniques, we adapt IBA to create relevance and optical flow maps, visually highlighting manipulated regions and motion inconsistencies. Our results show that VIBA generates temporally and spatially consistent explanations, which align closely with human annotations, thus providing interpretability for video classification and particularly for deepfake detection.
翻译:我们提出了一种新颖的可解释视频分类方法VIBA,该方法通过将信息瓶颈归因(IBA)技术适配到视频序列中来实现。尽管大多数传统的可解释性方法是为图像模型设计的,但我们的IBA框架解决了视频分析中使用的时序模型对可解释性的需求。为了证明其有效性,我们将VIBA应用于视频深度伪造检测,并在两种架构上进行了测试:用于提取空间特征的Xception模型,以及一种基于VGG11、通过光流捕捉运动动态的模型。利用一个反映近期深度伪造生成技术的定制数据集,我们调整IBA以生成相关性图和光流图,从而在视觉上突出显示被篡改的区域和运动不一致性。我们的结果表明,VIBA能够生成时间和空间上一致的解释,这些解释与人工标注高度吻合,从而为视频分类,特别是深度伪造检测,提供了可解释性。