UniTTA: Unified Benchmark and Versatile Framework Towards Realistic Test-Time Adaptation

Test-Time Adaptation (TTA) aims to adapt pre-trained models to the target domain during testing. In reality, this adaptability can be influenced by multiple factors. Researchers have identified various challenging scenarios and developed diverse methods to address these challenges, such as dealing with continual domain shifts, mixed domains, and temporally correlated or imbalanced class distributions. Despite these efforts, a unified and comprehensive benchmark has yet to be established. To this end, we propose a Unified Test-Time Adaptation (UniTTA) benchmark, which is comprehensive and widely applicable. Each scenario within the benchmark is fully described by a Markov state transition matrix for sampling from the original dataset. The UniTTA benchmark considers both domain and class as two independent dimensions of data and addresses various combinations of imbalance/balance and i.i.d./non-i.i.d./continual conditions, covering a total of \( (2 \times 3)^2 = 36 \) scenarios. It establishes a comprehensive evaluation benchmark for realistic TTA and provides a guideline for practitioners to select the most suitable TTA method. Alongside this benchmark, we propose a versatile UniTTA framework, which includes a Balanced Domain Normalization (BDN) layer and a COrrelated Feature Adaptation (COFA) method--designed to mitigate distribution gaps in domain and class, respectively. Extensive experiments demonstrate that our UniTTA framework excels within the UniTTA benchmark and achieves state-of-the-art performance on average. Our code is available at \url{https://github.com/LeapLabTHU/UniTTA}.

翻译：测试时适应旨在将预训练模型在测试阶段适应到目标域。现实中，这种适应能力可能受到多种因素影响。研究者已识别出多种具有挑战性的场景，并开发了多样化的方法来应对这些挑战，例如处理持续域偏移、混合域以及时间相关或类别分布不平衡的情况。尽管已有这些努力，一个统一且全面的基准尚未建立。为此，我们提出了一个统一测试时适应基准，该基准具有全面性和广泛适用性。基准中的每个场景均通过一个用于从原始数据集采样的马尔可夫状态转移矩阵完整描述。UniTTA基准将域和类别视为数据的两个独立维度，并处理不平衡/平衡与独立同分布/非独立同分布/持续条件的不同组合，共覆盖 \( (2 \times 3)^2 = 36 \) 种场景。它为真实场景下的TTA建立了一个全面的评估基准，并为实践者选择最合适的TTA方法提供了指导。与此基准配套，我们提出了一个通用的UniTTA框架，该框架包含一个平衡域归一化层和一个相关特征适应方法——分别用于缓解域和类别上的分布差异。大量实验表明，我们的UniTTA框架在UniTTA基准内表现优异，并平均达到了最先进的性能。我们的代码公开于 \url{https://github.com/LeapLabTHU/UniTTA}。