Towards Open Vocabulary Learning: A Survey

In the field of visual scene understanding, deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection. However, most approaches operate on the close-set assumption, meaning that the model can only identify pre-defined categories that are present in the training set. Recently, open vocabulary settings were proposed due to the rapid progress of vision language pre-training. These new approaches seek to locate and recognize categories beyond the annotated label space. The open vocabulary approach is more general, practical, and effective compared to weakly supervised and zero-shot settings. This paper provides a thorough review of open vocabulary learning, summarizing and analyzing recent developments in the field. In particular, we begin by comparing it to related concepts such as zero-shot learning, open-set recognition, and out-of-distribution detection. Then, we review several closely related tasks in the case of segmentation and detection, including long-tail problems, few-shot, and zero-shot settings. For the method survey, we first present the basic knowledge of detection and segmentation in close-set as the preliminary knowledge. Next, we examine various scenarios in which open vocabulary learning is used, identifying common design elements and core ideas. Then, we compare the recent detection and segmentation approaches in commonly used datasets and benchmarks. Finally, we conclude with insights, issues, and discussions regarding future research directions. To our knowledge, this is the first comprehensive literature review of open vocabulary learning. We keep tracing related works at https://github.com/jianzongwu/Awesome-Open-Vocabulary.

翻译：在视觉场景理解领域，深度神经网络已在分割、跟踪和检测等核心任务中取得显著进展。然而，大多数方法基于封闭集假设运行，即模型仅能识别训练集中预定义的类别。近年来，得益于视觉语言预训练的快速发展，开放词汇设置被提出。这类新方法旨在定位并识别超出标注标签空间之外的类别。与弱监督和零样本设置相比，开放词汇方法更具通用性、实用性和有效性。本文对开放词汇学习进行了全面综述，总结并分析了该领域的最新进展。具体而言，我们首先将其与零样本学习、开放集识别和分布外检测等相关概念进行比较。随后，在分割和检测场景下回顾了若干密切相关的任务，包括长尾问题、少样本和零样本设置。在方法综述部分，我们首先介绍封闭集下检测和分割的基础知识作为预备知识。接着，考察开放词汇学习应用的多种场景，识别常见设计要素与核心思想。然后，我们在常用数据集和基准上比较了最新的检测和分割方法。最后，我们总结观点、问题并讨论未来研究方向。据我们所知，这是开放词汇学习的首篇综合性文献综述。我们持续在 https://github.com/jianzongwu/Awesome-Open-Vocabulary 跟踪相关研究工作。