Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales

Developing robust and interpretable vision systems is a crucial step towards trustworthy artificial intelligence. In this regard, a promising paradigm considers embedding task-required invariant structures, e.g., geometric invariance, in the fundamental image representation. However, such invariant representations typically exhibit limited discriminability, limiting their applications in larger-scale trustworthy vision tasks. For this open problem, we conduct a systematic investigation of hierarchical invariance, exploring this topic from theoretical, practical, and application perspectives. At the theoretical level, we show how to construct over-complete invariants with a Convolutional Neural Networks (CNN)-like hierarchical architecture yet in a fully interpretable manner. The general blueprint, specific definitions, invariant properties, and numerical implementations are provided. At the practical level, we discuss how to customize this theoretical framework into a given task. With the over-completeness, discriminative features w.r.t. the task can be adaptively formed in a Neural Architecture Search (NAS)-like manner. We demonstrate the above arguments with accuracy, invariance, and efficiency results on texture, digit, and parasite classification experiments. Furthermore, at the application level, our representations are explored in real-world forensics tasks on adversarial perturbations and Artificial Intelligence Generated Content (AIGC). Such applications reveal that the proposed strategy not only realizes the theoretically promised invariance, but also exhibits competitive discriminability even in the era of deep learning. For robust and interpretable vision tasks at larger scales, hierarchical invariant representation can be considered as an effective alternative to traditional CNN and invariants.

翻译：构建鲁棒且可解释的视觉系统是实现可信人工智能的关键步骤。在这方面，一种有前景的范式是将任务所需的不变结构（如几何不变性）嵌入基础图像表征中。然而，此类不变表征通常区分能力有限，限制了其在更大规模可信视觉任务中的应用。针对这一开放性问题，我们从理论、实践与应用三个层面系统研究了层次不变性。在理论层面，我们展示了如何以完全可解释的方式，利用类卷积神经网络（CNN）的层次架构构建超完备不变特征，并提供了通用框架、具体定义、不变性质及数值实现方法。在实践层面，我们探讨了如何将这一理论框架定制化应用于特定任务。通过超完备性，可基于神经架构搜索（NAS）机制自适应形成与任务相关的判别性特征。我们在纹理、数字和寄生虫分类实验中验证了上述方法在准确性、不变性和效率方面的性能。在应用层面，我们进一步探索了所提出的表征在对抗扰动和人工智能生成内容（AIGC）等真实取证任务中的应用。结果表明，该策略不仅实现了理论承诺的不变性，还在深度学习时代展现出具有竞争力的判别能力。对于大规模鲁棒可解释视觉任务，层次不变表征可作为传统CNN和不变特征的有效替代方案。