In this paper, we present a comprehensive survey on online test-time adaptation (OTTA), a paradigm focused on adapting machine learning models to novel data distributions upon batch arrival. Despite the proliferation of OTTA methods recently, the field is mired in issues like ambiguous settings, antiquated backbones, and inconsistent hyperparameter tuning, obfuscating the real challenges and making reproducibility elusive. For clarity and a rigorous comparison, we classify OTTA techniques into three primary categories and subject them to benchmarks using the potent Vision Transformer (ViT) backbone to discover genuinely effective strategies. Our benchmarks span not only conventional corrupted datasets such as CIFAR-10/100-C and ImageNet-C but also real-world shifts embodied in CIFAR-10.1 and CIFAR-10-Warehouse, encapsulating variations across search engines and synthesized data by diffusion models. To gauge efficiency in online scenarios, we introduce novel evaluation metrics, inclusive of FLOPs, shedding light on the trade-offs between adaptation accuracy and computational overhead. Our findings diverge from existing literature, indicating: (1) transformers exhibit heightened resilience to diverse domain shifts, (2) the efficacy of many OTTA methods hinges on ample batch sizes, and (3) stability in optimization and resistance to perturbations are critical during adaptation, especially when the batch size is 1. Motivated by these insights, we pointed out promising directions for future research. The source code will be made available.
翻译:本文对在线测试时适应(OTTA)进行了全面综述,该范式专注于在批量数据到达时,使机器学习模型适应新的数据分布。尽管近年来OTTA方法层出不穷,但该领域仍存在设定模糊、骨干网络陈旧、超参数调优不一致等问题,这不仅掩盖了真正的挑战,也使得结果难以复现。为清晰且严格地进行比较,我们将OTTA技术分为三大类,并使用强大的Vision Transformer(ViT)骨干网络对其进行基准测试,以发现真正有效的策略。我们的基准测试不仅涵盖传统的损坏数据集(如CIFAR-10/100-C和ImageNet-C),还包括CIFAR-10.1和CIFAR-10-Warehouse所体现的真实世界分布偏移,涵盖了搜索引擎和扩散模型合成数据带来的变化。为评估在线场景下的效率,我们引入了新的评估指标,包括浮点运算次数(FLOPs),以揭示适应准确性与计算开销之间的权衡。我们的发现与现有文献有所不同,表明:(1)Transformer对多种领域偏移表现出更高的鲁棒性;(2)许多OTTA方法的有效性依赖于足够大的批量大小;(3)在适应过程中,特别是当批量大小为1时,优化的稳定性和对扰动的抵抗能力至关重要。基于这些见解,我们指出了未来研究的有前景方向。源代码将公开提供。