DIS2: Disentanglement Meets Distillation with Classwise Attention for Robust Remote Sensing Segmentation under Missing Modalities

The efficacy of multimodal learning in remote sensing (RS) is severely undermined by missing modalities. The challenge is exacerbated by the RS highly heterogeneous data and huge scale variation. Consequently, paradigms proven effective in other domains often fail when confronted with these unique data characteristics. Conventional disentanglement learning, which relies on significant feature overlap between modalities (modality-invariant), is insufficient for this heterogeneity. Similarly, knowledge distillation becomes an ill-posed mimicry task where a student fails to focus on the necessary compensatory knowledge, leaving the semantic gap unaddressed. Our work is therefore built upon three pillars uniquely designed for RS: (1) principled missing information compensation, (2) class-specific modality contribution, and (3) multi-resolution feature importance. We propose a novel method DIS2, a new paradigm shifting from modality-shared feature dependence and untargeted imitation to active, guided missing features compensation. Its core novelty lies in a reformulated synergy between disentanglement learning and knowledge distillation, termed DLKD. Compensatory features are explicitly captured which, when fused with the features of the available modality, approximate the ideal fused representation of the full-modality case. To address the class-specific challenge, our Classwise Feature Learning Module (CFLM) adaptively learn discriminative evidence for each target depending on signal availability. Both DLKD and CFLM are supported by a hierarchical hybrid fusion (HF) structure using features across resolutions to strengthen prediction. Extensive experiments validate that our proposed approach significantly outperforms state-of-the-art methods across benchmarks.

翻译：多模态学习在遥感（RS）领域的效能常因模态缺失而严重受损。遥感数据的高度异构性及巨大的尺度变化进一步加剧了这一挑战。因此，在其他领域被证明有效的范式在面对这些独特的数据特性时往往失效。传统的解耦学习依赖于模态间显著的特征重叠（模态不变性），对于这种异构性而言并不充分。类似地，知识蒸馏则成为一个不适定的模仿任务，学生模型难以聚焦于必要的补偿性知识，导致语义鸿沟未能得到解决。因此，我们的工作建立在三个专为遥感设计的支柱之上：（1）基于原理的缺失信息补偿，（2）类别特定的模态贡献，以及（3）多分辨率特征重要性。我们提出了一种新方法 DIS2，这是一种新的范式，从依赖模态共享特征和无目标的模仿，转向主动、有指导的缺失特征补偿。其核心创新在于重构了解耦学习与知识蒸馏之间的协同机制，称为 DLKD。该方法显式地捕获补偿性特征，当这些特征与可用模态的特征融合时，能够近似全模态情况下的理想融合表示。为解决类别特定的挑战，我们的类级特征学习模块（CFLM）根据信号可用性，自适应地学习每个目标的判别性证据。DLKD 和 CFLM 均得到一种分层混合融合（HF）结构的支持，该结构利用跨分辨率的特征来增强预测。大量实验验证了我们所提出的方法在多个基准测试上显著优于现有最先进方法。