Memory-Efficient Pseudo-Labeling for Online Source-Free Universal Domain Adaptation using a Gaussian Mixture Model

In practice, domain shifts are likely to occur between training and test data, necessitating domain adaptation (DA) to adjust the pre-trained source model to the target domain. Recently, universal domain adaptation (UniDA) has gained attention for addressing the possibility of an additional category (label) shift between the source and target domain. This means new classes can appear in the target data, some source classes may no longer be present, or both at the same time. For practical applicability, UniDA methods must handle both source-free and online scenarios, enabling adaptation without access to the source data and performing batch-wise updates in parallel with prediction. In an online setting, preserving knowledge across batches is crucial. However, existing methods often require substantial memory, e.g. by using memory queues, which is impractical because memory is limited and valuable, in particular on embedded systems. Therefore, we consider memory-efficiency as an additional constraint in this paper. To achieve memory-efficient online source-free universal domain adaptation (SF-UniDA), we propose a novel method that continuously captures the distribution of known classes in the feature space using a Gaussian mixture model (GMM). This approach, combined with entropy-based out-of-distribution detection, allows for the generation of reliable pseudo-labels. Finally, we combine a contrastive loss with a KL divergence loss to perform the adaptation. Our approach not only achieves state-of-the-art results in all experiments on the DomainNet dataset but also significantly outperforms the existing methods on the challenging VisDA-C dataset, setting a new benchmark for online SF-UniDA. Our code is available at https://github.com/pascalschlachter/GMM.

翻译：在实际应用中，训练数据与测试数据之间常出现域偏移，这需要通过域自适应技术将预训练的源域模型调整至目标域。近年来，通用域自适应方法因能处理源域与目标域间可能存在的额外类别偏移而受到关注。这意味着目标数据中可能出现新类别，部分源域类别可能消失，或两种情况同时发生。为满足实际应用需求，通用域自适应方法需同时支持无源场景与在线场景：即在无法访问源数据的情况下完成自适应，并实现预测过程中的逐批次更新。在线场景中，跨批次的知识保持至关重要。然而，现有方法常需大量内存（如使用记忆队列），这在内存受限的嵌入式系统中尤为不切实际。为此，本文额外引入内存效率约束。为实现高效内存的在线无源通用域自适应，我们提出一种创新方法：通过高斯混合模型持续捕捉特征空间中已知类别的分布特征。该方法结合基于熵的分布外检测机制，能够生成可靠的伪标签。最后，我们融合对比损失与KL散度损失完成自适应过程。该方法不仅在DomainNet数据集的所有实验中达到最优性能，更在具有挑战性的VisDA-C数据集上显著超越现有方法，为在线无源通用域自适应设立了新基准。代码已开源：https://github.com/pascalschlachter/GMM。