测试时自适应分层协同增强去噪网络：面向可靠多模态分类 (Test-time Adaptive Hierarchical Co-enhanced Denoising Network for Reliable Multimodal Classification)

Reliable learning on low-quality multimodal data is a widely concerning issue, especially in safety-critical applications. However, multimodal noise poses a major challenge in this domain and leads existing methods to suffer from two key limitations. First, they struggle to reliably remove heterogeneous data noise, hindering robust multimodal representation learning. Second, they exhibit limited adaptability and generalization when encountering previously unseen noise. To address these issues, we propose Test-time Adaptive Hierarchical Co-enhanced Denoising Network (TAHCD). On one hand, TAHCD introduces the Adaptive Stable Subspace Alignment and Sample-Adaptive Confidence Alignment to reliably remove heterogeneous noise. They account for noise at both global and instance levels and enable jointly removal of modality-specific and cross-modality noise, achieving robust learning. On the other hand, TAHCD introduces test-time cooperative enhancement, which adaptively updates the model in response to input noise in a label-free manner, improving adaptability and generalization. This is achieved by collaboratively enhancing the joint removal process of modality-specific and cross-modality noise across global and instance levels according to sample noise. Experiments on multiple benchmarks demonstrate that the proposed method achieves superior classification performance, robustness, and generalization compared with state-of-the-art reliable multimodal learning approaches.

翻译：在低质量多模态数据上进行可靠学习是一个广受关注的问题，尤其在安全关键型应用中。然而，多模态噪声是该领域面临的主要挑战，并导致现有方法存在两个关键局限。首先，它们难以可靠地去除异构数据噪声，阻碍了鲁棒的多模态表征学习。其次，在遇到先前未见过的噪声时，它们表现出有限的适应性和泛化能力。为解决这些问题，我们提出了测试时自适应分层协同增强去噪网络（TAHCD）。一方面，TAHCD引入了自适应稳定子空间对齐与样本自适应置信度对齐，以可靠地去除异构噪声。这些方法同时考虑了全局和实例层面的噪声，能够联合去除模态特定噪声和跨模态噪声，从而实现鲁棒学习。另一方面，TAHCD引入了测试时协同增强机制，该机制以无标签方式根据输入噪声自适应地更新模型，从而提升适应性与泛化能力。这是通过根据样本噪声，在全局和实例层面协同增强模态特定噪声与跨模态噪声的联合去除过程来实现的。在多个基准数据集上的实验表明，与当前最先进的可靠多模态学习方法相比，所提方法在分类性能、鲁棒性和泛化能力方面均取得了更优的结果。