Test-time adaptation (TTA) is a technique aimed at enhancing the generalization performance of models by leveraging unlabeled samples solely during prediction. Given the need for robustness in neural network systems when faced with distribution shifts, numerous TTA methods have recently been proposed. However, evaluating these methods is often done under different settings, such as varying distribution shifts, backbones, and designing scenarios, leading to a lack of consistent and fair benchmarks to validate their effectiveness. To address this issue, we present a benchmark that systematically evaluates 13 prominent TTA methods and their variants on five widely used image classification datasets: CIFAR-10-C, CIFAR-100-C, ImageNet-C, DomainNet, and Office-Home. These methods encompass a wide range of adaptation scenarios (e.g. online adaptation v.s. offline adaptation, instance adaptation v.s. batch adaptation v.s. domain adaptation). Furthermore, we explore the compatibility of different TTA methods with diverse network backbones. To implement this benchmark, we have developed a unified framework in PyTorch, which allows for consistent evaluation and comparison of the TTA methods across the different datasets and network architectures. By establishing this benchmark, we aim to provide researchers and practitioners with a reliable means of assessing and comparing the effectiveness of TTA methods in improving model robustness and generalization performance. Our code is available at https://github.com/yuyongcan/Benchmark-TTA.
翻译:测试时自适应(TTA)是一种通过在预测阶段仅利用无标签样本来提升模型泛化性能的技术。鉴于神经网络系统在面对分布偏移时对鲁棒性的需求,近期涌现出大量TTA方法。然而,这些方法的评估往往基于不同设定——例如不同的分布偏移类型、骨干网络以及测试场景设计——导致缺乏统一且公平的基准来验证其有效性。为解决这一问题,我们提出了一套基准测试,系统评估了13种主流TTA方法及其变体在五个广泛使用的图像分类数据集(CIFAR-10-C、CIFAR-100-C、ImageNet-C、DomainNet和Office-Home)上的表现。这些方法覆盖了多种自适应场景(例如在线自适应与离线自适应、实例自适应与批次自适应、域自适应)。此外,我们还探究了不同TTA方法与多种网络骨干的兼容性。为实现该基准测试,我们基于PyTorch开发了一个统一框架,允许在不同数据集和网络架构间对TTA方法进行一致的评估与比较。通过建立此基准测试,我们旨在为研究人员和从业者提供可靠途径,以评估和比较TTA方法在提升模型鲁棒性与泛化性能方面的有效性。我们的代码开源在https://github.com/yuyongcan/Benchmark-TTA。