Test-Time Adaptation (TTA) allows to update pre-trained models to changing data distributions at deployment time. While early work tested these algorithms for individual fixed distribution shifts, recent work proposed and applied methods for continual adaptation over long timescales. To examine the reported progress in the field, we propose the Continually Changing Corruptions (CCC) benchmark to measure asymptotic performance of TTA techniques. We find that eventually all but one state-of-the-art methods collapse and perform worse than a non-adapting model, including models specifically proposed to be robust to performance collapse. In addition, we introduce a simple baseline, "RDumb", that periodically resets the model to its pretrained state. RDumb performs better or on par with the previously proposed state-of-the-art in all considered benchmarks. Our results show that previous TTA approaches are neither effective at regularizing adaptation to avoid collapse nor able to outperform a simplistic resetting strategy.
翻译:测试时自适应(TTA)允许在部署过程中更新预训练模型,以适应不断变化的数据分布。早期工作针对单个固定分布偏移测试了这些算法,而近期研究则提出并应用了在长时间尺度上持续自适应的方法。为审视该领域报告的进展,我们提出了持续变化扰动(CCC)基准来评估TTA技术的渐近性能。我们发现,除一项方法外,所有最先进的方法最终都会崩溃,且性能劣于非自适应模型——包括那些专为抵御性能崩溃而设计的模型。此外,我们引入了一个简单基线方法“RDumb”,该方法周期性地将模型重置为其预训练状态。在所有考虑的基准测试中,RDumb的表现优于或与先前提出的最先进方法持平。我们的结果表明,现有的TTA方法既无法有效正则化自适应以避免崩溃,也无法胜过这种简单的重置策略。