In this paper, we present a comprehensive survey on online test-time adaptation (OTTA), a paradigm focused on adapting machine learning models to novel data distributions upon batch arrival. Despite the proliferation of OTTA methods recently, the field is mired in issues like ambiguous settings, antiquated backbones, and inconsistent hyperparameter tuning, obfuscating the real challenges and making reproducibility elusive. For clarity and a rigorous comparison, we classify OTTA techniques into three primary categories and subject them to benchmarks using the potent Vision Transformer (ViT) backbone to discover genuinely effective strategies. Our benchmarks span not only conventional corrupted datasets such as CIFAR-10/100-C and ImageNet-C but also real-world shifts embodied in CIFAR-10.1 and CIFAR-10-Warehouse, encapsulating variations across search engines and synthesized data by diffusion models. To gauge efficiency in online scenarios, we introduce novel evaluation metrics, inclusive of FLOPs, shedding light on the trade-offs between adaptation accuracy and computational overhead. Our findings diverge from existing literature, indicating: (1) transformers exhibit heightened resilience to diverse domain shifts, (2) the efficacy of many OTTA methods hinges on ample batch sizes, and (3) stability in optimization and resistance to perturbations are critical during adaptation, especially when the batch size is 1. Motivated by these insights, we pointed out promising directions for future research. The source code is made available: https://github.com/Jo-wang/OTTA_ViT_survey.
翻译:本文对在线测试时自适应(OTTA)进行了全面综述,该范式旨在将机器学习模型适应于批量到达的新数据分布。尽管近年来OTTA方法层出不穷,但该领域仍面临设定模糊、骨干网络陈旧、超参数调优不一致等问题,这掩盖了真正的挑战,并导致可重复性难以实现。为明确和严格比较,我们将OTTA技术分为三大主要类别,并使用强大的Vision Transformer(ViT)骨干网络进行基准测试,以发现真正有效的策略。我们的基准测试不仅涵盖CIFAR-10/100-C和ImageNet-C等传统损坏数据集,还包括CIFAR-10.1和CIFAR-10-Warehouse所体现的真实世界偏移,这些偏移涵盖了搜索引擎和扩散模型生成数据的变化。为评估在线场景的效率,我们引入了新的评估指标,包括FLOPs,揭示了自适应准确性与计算开销之间的权衡。我们的发现与现有文献存在分歧,表明:(1)Transformer对不同领域偏移表现出更高的鲁棒性;(2)许多OTTA方法的有效性依赖于足够大的批量大小;(3)在优化过程中,稳定性和抗扰动性至关重要,尤其当批量大小为1时。基于这些洞见,我们指出了未来研究的有希望方向。源代码已公开:https://github.com/Jo-wang/OTTA_ViT_survey。