The advancements in generative modeling, particularly the advent of diffusion models, have sparked a fundamental question: how can these models be effectively used for discriminative tasks? In this work, we find that generative models can be great test-time adapters for discriminative models. Our method, Diffusion-TTA, adapts pre-trained discriminative models such as image classifiers, segmenters and depth predictors, to each unlabelled example in the test set using generative feedback from a diffusion model. We achieve this by modulating the conditioning of the diffusion model using the output of the discriminative model. We then maximize the image likelihood objective by backpropagating the gradients to discriminative model's parameters. We show Diffusion-TTA significantly enhances the accuracy of various large-scale pre-trained discriminative models, such as, ImageNet classifiers, CLIP models, image pixel labellers and image depth predictors. Diffusion-TTA outperforms existing test-time adaptation methods, including TTT-MAE and TENT, and particularly shines in online adaptation setups, where the discriminative model is continually adapted to each example in the test set. We provide access to code, results, and visualizations on our website: https://diffusion-tta.github.io/.
翻译:生成建模的进步,特别是扩散模型的出现,引发了一个根本性问题:如何有效利用这些模型处理判别式任务?本研究发现,生成模型可作为判别式模型出色的测试时适配器。我们提出的Diffusion-TTA方法,通过扩散模型的生成反馈,将预训练的判别式模型(如图像分类器、分割器和深度预测器)自适应地应用于测试集中每个无标签样本。具体实现中,我们利用判别式模型的输出调节扩散模型的条件,并通过反向传播梯度优化图像似然目标函数来更新判别式模型参数。实验表明,Diffusion-TTA能显著提升各类大规模预训练判别式模型(包括ImageNet分类器、CLIP模型、像素级图像标注器和深度预测器)的精度。该方法超越了现有测试时自适应技术(如TTT-MAE和TENT),在需要对测试集每个样本进行连续自适应的在线场景中表现尤为突出。相关代码、结果及可视化内容可访问项目网站:https://diffusion-tta.github.io/。