Modern visual systems have a wide range of potential applications in vision tasks for natural science research, such as aiding in species discovery, monitoring animals in the wild, and so on. However, real-world vision tasks may experience changes in environmental conditions, leading to shifts in how captured images are presented. To address this issue, we introduce Domain-Aware Continual Zero-Shot Learning (DACZSL), a task to recognize images of unseen categories in continuously changing domains. Accordingly, we propose a Domain-Invariant Network (DIN) to learn factorized features for shifting domains and improved textual representation for unseen classes. DIN continually learns a global shared network for domain-invariant and task-invariant features, and per-task private networks for task-specific features. Furthermore, we enhance the dual network with class-wise learnable prompts to improve class-level text representation, thereby improving zero-shot prediction of future unseen classes. To evaluate DACZSL, we introduce two benchmarks, DomainNet-CZSL and iWildCam-CZSL. Our results show that DIN significantly outperforms existing baselines by over 5% in harmonic accuracy and over 1% in backward transfer and achieves a new SoTA.
翻译:现代视觉系统在自然科学研究中的视觉任务中具有广泛的应用潜力,例如辅助物种发现、野外动物监测等。然而,现实世界中的视觉任务可能会经历环境条件的变化,导致捕获图像呈现方式发生改变。为解决这一问题,我们引入了领域感知的持续零样本学习(DACZSL),该任务旨在识别持续变化领域中未见类别的图像。为此,我们提出了一种域不变网络(DIN),用于学习分解特征以适应变化的领域,并改进未见类别的文本表示。DIN持续学习一个全局共享网络以提取域不变和任务不变特征,以及每任务私有网络以提取任务特定特征。此外,我们通过类级可学习提示增强双网络,以改善类级文本表示,从而提升对未来未见类别的零样本预测性能。为评估DACZSL,我们引入了两个基准数据集:DomainNet-CZSL和iWildCam-CZSL。实验结果表明,DIN在调和准确率上显著优于现有基线方法超过5%,在后向迁移中超过1%,并达到了新的最先进水平(SoTA)。