Vision-language models suffer performance degradation under domain shift, limiting real-world applicability. Existing test-time adaptation methods are computationally intensive, rely on back-propagation, and often focus on single modalities. To address these issues, we propose Training-free Test-Time Adaptation with Brownian Distance Covariance (TaTa). TaTa leverages Brownian Distance Covariance-a powerful statistical measure that captures both linear and nonlinear dependencies via pairwise distances-to dynamically adapt VLMs to new domains without training or back-propagation. This not only improves efficiency but also enhances stability by avoiding disruptive weight updates. TaTa further integrates attribute-enhanced prompting to improve vision-language inference with descriptive visual cues. Combined with dynamic clustering and pseudo-label refinement, it effectively recalibrates the model for novel visual contexts. Experiments across diverse datasets show that TaTa significantly reduces computational cost while achieving state-of-the-art performance in domain and cross-dataset generalization.
翻译:视觉语言模型在领域偏移下存在性能退化问题,这限制了其实际应用。现有的测试时适配方法计算成本高、依赖反向传播,且通常仅关注单一模态。为解决这些问题,我们提出基于布朗距离协方差的免训练测试时适配方法(TaTa)。TaTa利用布朗距离协方差——一种通过成对距离同时捕捉线性和非线性依赖关系的强大统计度量——在无需训练或反向传播的情况下,动态地将视觉语言模型适配到新领域。这不仅提升了效率,还通过避免破坏性的权重更新增强了稳定性。TaTa进一步整合属性增强提示,利用描述性视觉线索改进视觉语言推理。结合动态聚类与伪标签优化,该方法能有效针对新颖视觉上下文重新校准模型。跨多个数据集的实验表明,TaTa在显著降低计算成本的同时,在领域泛化与跨数据集泛化任务中达到了最先进的性能水平。