Memory-Efficient Pseudo-Labeling for Online Source-Free Universal Domain Adaptation using a Gaussian Mixture Model

In practice, domain shifts are likely to occur between training and test data, necessitating domain adaptation (DA) to adjust the pre-trained source model to the target domain. Recently, universal domain adaptation (UniDA) has gained attention for addressing the possibility of an additional category (label) shift between the source and target domain. This means new classes can appear in the target data, some source classes may no longer be present, or both at the same time. For practical applicability, UniDA methods must handle both source-free and online scenarios, enabling adaptation without access to the source data and performing batch-wise updates in parallel with prediction. In an online setting, preserving knowledge across batches is crucial. However, existing methods often require substantial memory, which is impractical because memory is limited and valuable, in particular on embedded systems. Therefore, we consider memory-efficiency as an additional constraint. To achieve memory-efficient online source-free universal domain adaptation (SF-UniDA), we propose a novel method that continuously captures the distribution of known classes in the feature space using a Gaussian mixture model (GMM). This approach, combined with entropy-based out-of-distribution detection, allows for the generation of reliable pseudo-labels. Finally, we combine a contrastive loss with a KL divergence loss to perform the adaptation. Our approach not only achieves state-of-the-art results in all experiments on the DomainNet and Office-Home datasets but also significantly outperforms the existing methods on the challenging VisDA-C dataset, setting a new benchmark for online SF-UniDA. Our code is available at https://github.com/pascalschlachter/GMM.

翻译：在实践中，训练数据与测试数据之间常出现域偏移，这需要通过域适应（DA）将预训练的源模型调整至目标域。近年来，通用域适应（UniDA）因能处理源域与目标域间可能存在的额外类别（标签）偏移而受到关注。这意味着目标数据中可能出现新类别，部分源类别可能不再存在，或两种情况同时发生。为满足实际应用需求，UniDA方法需同时支持无源场景与在线场景，即在无法访问源数据的情况下进行适应，并能与预测过程并行实现逐批次更新。在在线设置中，跨批次的知识保持至关重要。然而，现有方法通常需要大量内存，这在内存受限且宝贵的嵌入式系统中尤为不切实际。因此，我们将内存效率作为额外约束条件。为实现高效内存的在线无源通用域适应（SF-UniDA），本文提出一种新颖方法：通过高斯混合模型（GMM）持续捕捉特征空间中已知类别的分布。该方法结合基于熵的分布外检测机制，能够生成可靠的伪标签。最后，我们通过对比损失与KL散度损失的组合实现域适应。我们的方法不仅在DomainNet和Office-Home数据集的所有实验中达到最先进性能，更在具有挑战性的VisDA-C数据集上显著超越现有方法，为在线SF-UniDA设立了新基准。代码已开源：https://github.com/pascalschlachter/GMM。