Test-time adaptation (TTA) updates models during inference to reduce error on distribution shifts. While entropy minimization over the output distribution has proven effective as a TTA loss, we study using the intermediate distributions computed by transformers in the attention mechanism. We propose LookSharp, which minimizes the entropy of CLS-to-patch attention in the final layer as a novel TTA objective, encouraging the model to maintain focused attention on shifted data. We demonstrate that attention entropy minimization improves robustness on ImageNet-C. We also show that it is complementary to output entropy minimization and maintains performance on clean data.
翻译:测试时适应(TTA)在推理过程中更新模型,以减少分布偏移带来的误差。尽管在输出分布上进行熵最小化已被证明是一种有效的TTA损失函数,本研究探讨了利用Transformer注意力机制中计算的中间分布。我们提出LookSharp方法,其通过最小化最后一层中CLS-to-patch注意力的熵作为新颖的TTA目标,促使模型在偏移数据上保持聚焦的注意力。我们证明注意力熵最小化能提升在ImageNet-C上的鲁棒性。同时发现该方法与输出熵最小化具有互补性,并能在干净数据上保持性能。