Test-Time Adaptation (TTA) allows to update pretrained models to changing data distributions at deployment time. While early work tested these algorithms for individual fixed distribution shifts, recent work proposed and applied methods for continual adaptation over long timescales. To examine the reported progress in the field, we propose the Continuously Changing Corruptions (CCC) benchmark to measure asymptotic performance of TTA techniques. We find that eventually all but one state-of-the-art methods collapse and perform worse than a non-adapting model, including models specifically proposed to be robust to performance collapse. In addition, we introduce a simple baseline, "RDumb", that periodically resets the model to its pretrained state. RDumb performs better or on par with the previously proposed state-of-the-art in all considered benchmarks. Our results show that previous TTA approaches are neither effective at regularizing adaptation to avoid collapse nor able to outperform a simplistic resetting strategy.
翻译:测试时自适应(TTA)允许在部署时更新预训练模型以适应变化的数据分布。早期工作针对单个固定分布偏移测试了这些算法,而近期工作则提出并应用了长时间尺度下的持续自适应方法。为审视该领域报告的进展,我们提出了连续变化损坏(CCC)基准来衡量TTA技术的渐近性能。我们发现,最终除一种方法外,所有最先进的技术都会崩溃,且性能低于非自适应模型——包括那些专门为防止性能崩溃而设计的模型。此外,我们引入了一个简单基线“RDumb”,它周期性地将模型重置为预训练状态。在所有考虑的基准测试中,RDumb的表现优于或等同于此前提出的最先进方法。我们的结果表明,先前的TTA方法既不能有效正则化自适应以避免崩溃,也无法超越这种简单的重置策略。