Continual learning (CL) breaks off the one-way training manner and enables a model to adapt to new data, semantics and tasks continuously. However, current CL methods mainly focus on single tasks. Besides, CL models are plagued by catastrophic forgetting and semantic drift since the lack of old data, which often occurs in remote-sensing interpretation due to the intricate fine-grained semantics. In this paper, we propose Continual Panoptic Perception (CPP), a unified continual learning model that leverages multi-task joint learning covering pixel-level classification, instance-level segmentation and image-level perception for universal interpretation in remote sensing images. Concretely, we propose a collaborative cross-modal encoder (CCE) to extract the input image features, which supports pixel classification and caption generation synchronously. To inherit the knowledge from the old model without exemplar memory, we propose a task-interactive knowledge distillation (TKD) method, which leverages cross-modal optimization and task-asymmetric pseudo-labeling (TPL) to alleviate catastrophic forgetting. Furthermore, we also propose a joint optimization mechanism to achieve end-to-end multi-modal panoptic perception. Experimental results on the fine-grained panoptic perception dataset validate the effectiveness of the proposed model, and also prove that joint optimization can boost sub-task CL efficiency with over 13\% relative improvement on panoptic quality.
翻译:持续学习打破了单向训练模式,使模型能够持续适应新数据、新语义和新任务。然而,现有持续学习方法主要集中于单一任务。此外,由于缺乏旧数据,持续学习模型常受灾难性遗忘和语义漂移困扰,这在遥感图像解译中尤为常见,因其涉及复杂细粒度语义。本文提出持续全景感知模型,这是一种统一的持续学习模型,通过覆盖像素级分类、实例级分割和图像级感知的多任务联合学习,实现遥感图像的通用解译。具体而言,我们提出协同跨模态编码器来提取输入图像特征,该编码器同步支持像素分类与描述生成。为在无需示例记忆的情况下继承旧模型知识,我们提出任务交互式知识蒸馏方法,利用跨模态优化和任务非对称伪标注来缓解灾难性遗忘。此外,我们还提出联合优化机制以实现端到端多模态全景感知。在细粒度全景感知数据集上的实验结果验证了所提模型的有效性,并证明联合优化可将子任务持续学习效率提升超过13%(基于全景质量指标的相对改进)。