How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation

In machine learning, generalization against distribution shifts -- where deployment conditions diverge from the training scenarios -- is crucial, particularly in fields like climate modeling, biomedicine, and autonomous driving. The emergence of foundation models, distinguished by their extensive pretraining and task versatility, has led to an increased interest in their adaptability to distribution shifts. GPT-4V(ision) acts as the most advanced publicly accessible multimodal foundation model, with extensive applications across various domains, including anomaly detection, video understanding, image generation, and medical diagnosis. However, its robustness against data distributions remains largely underexplored. Addressing this gap, this study rigorously evaluates GPT-4V's adaptability and generalization capabilities in dynamic environments, benchmarking against prominent models like CLIP and LLaVA. We delve into GPT-4V's zero-shot generalization across 13 diverse datasets spanning natural, medical, and molecular domains. We further investigate its adaptability to controlled data perturbations and examine the efficacy of in-context learning as a tool to enhance its adaptation. Our findings delineate GPT-4V's capability boundaries in distribution shifts, shedding light on its strengths and limitations across various scenarios. Importantly, this investigation contributes to our understanding of how AI foundation models generalize to distribution shifts, offering pivotal insights into their adaptability and robustness. Code is publicly available at https://github.com/jameszhou-gl/gpt-4v-distribution-shift.

翻译：在机器学习中，应对分布偏移（即部署条件与训练场景不一致）的泛化能力至关重要，尤其是在气候建模、生物医学和自动驾驶等领域。基础模型凭借其广泛的预训练和任务通用性脱颖而出，其适应分布偏移的能力也日益受到关注。GPT-4V(ision) 作为最先进且可公开访问的多模态基础模型，在异常检测、视频理解、图像生成和医疗诊断等多个领域得到了广泛应用。然而，其对数据分布的鲁棒性尚未得到充分探索。为填补这一空白，本研究系统评估了GPT-4V在动态环境中的适应性和泛化能力，并与CLIP和LLaVA等主流模型进行了基准对比。我们深入探讨了GPT-4V在涵盖自然、医学和分子领域的13个多样化数据集上的零样本泛化表现。进一步地，我们研究了其对受控数据扰动的适应性，并考察了上下文学习作为增强适应性的工具的有效性。研究结果描绘了GPT-4V在分布偏移下的能力边界，揭示了其在不同场景下的优势与局限。重要的是，这项研究加深了我们对AI基础模型如何泛化至分布偏移的理解，为其适应性和鲁棒性提供了关键见解。代码已公开于 https://github.com/jameszhou-gl/gpt-4v-distribution-shift。