Waffling around for Performance: Visual Classification with Random Words and Broad Concepts

The visual classification performance of vision-language models such as CLIP can benefit from additional semantic knowledge, e.g. via large language models (LLMs) such as GPT-3. Further extending classnames with LLM-generated class descriptors, e.g. ``waffle, \textit{which has a round shape}'', or averaging retrieval scores over multiple such descriptors, has been shown to improve generalization performance. In this work, we study this behavior in detail and propose \texttt{Waffle}CLIP, a framework for zero-shot visual classification which achieves similar performance gains on a large number of visual classification tasks by simply replacing LLM-generated descriptors with random character and word descriptors \textbf{without} querying external models. We extend these results with an extensive experimental study on the impact and shortcomings of additional semantics introduced via LLM-generated descriptors, and showcase how semantic context is better leveraged by automatically querying LLMs for high-level concepts, while jointly resolving potential class name ambiguities. Link to the codebase: https://github.com/ExplainableML/WaffleCLIP.

翻译：视觉语言模型（如CLIP）的视觉分类性能可受益于额外语义知识，例如通过大型语言模型（如GPT-3）获取。进一步用大语言模型生成的类别描述符（如“华夫饼，具有圆形形状”）扩展类名，或对多个此类描述符的检索分数取平均，已被证明能提升泛化性能。本研究详细探讨了这一行为，并提出\texttt{Waffle}CLIP框架——一种零样本视觉分类方法，其通过简单地将大语言模型生成的描述符替换为随机字符和词语描述符（\textbf{无需}查询外部模型），即可在大量视觉分类任务上取得相近的性能提升。我们通过一项广泛的实验研究，进一步探讨大语言模型生成描述符中额外语义的影响与局限性，并展示如何通过自动查询大语言模型获取高层概念来更有效利用语义上下文，同时联合解决潜在的类名歧义问题。代码库链接：https://github.com/ExplainableML/WaffleCLIP。

相关内容

Performance

关注 3

Performance：International Symposium on Computer Performance Modeling, Measurements and Evaluation。 Explanation：计算机性能建模、测量和评估国际研讨会。 Publisher：ACM。 SIT：http://dblp.uni-trier.de/db/conf/performance/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日