In the field of visual scene understanding, deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection. However, most approaches operate on the close-set assumption, meaning that the model can only identify pre-defined categories that are present in the training set. Recently, open vocabulary settings were proposed due to the rapid progress of vision language pre-training. These new approaches seek to locate and recognize categories beyond the annotated label space. The open vocabulary approach is more general, practical, and effective compared to weakly supervised and zero-shot settings. This paper provides a thorough review of open vocabulary learning, summarizing and analyzing recent developments in the field. In particular, we begin by comparing it to related concepts such as zero-shot learning, open-set recognition, and out-of-distribution detection. Then, we review several closely related tasks in the case of segmentation and detection, including long-tail problems, few-shot, and zero-shot settings. For the method survey, we first present the basic knowledge of detection and segmentation in close-set as the preliminary knowledge. Next, we examine various scenarios in which open vocabulary learning is used, identifying common design elements and core ideas. Then, we compare the recent detection and segmentation approaches in commonly used datasets and benchmarks. Finally, we conclude with insights, issues, and discussions regarding future research directions. To our knowledge, this is the first comprehensive literature review of open vocabulary learning. We keep tracing related works at https://github.com/jianzongwu/Awesome-Open-Vocabulary.
翻译:在视觉场景理解领域,深度神经网络已在分割、跟踪和检测等核心任务中取得显著进展。然而,大多数方法基于封闭集假设运行,即模型仅能识别训练集中预先定义的类别。近年来,随着视觉语言预训练的快速发展,开放词汇设置被提出。这些新方法旨在定位和识别超出标注标签空间的类别。与弱监督和零样本设置相比,开放词汇方法更具通用性、实用性和有效性。本文对开放词汇学习进行了全面综述,总结并分析了该领域的最新进展。具体而言,我们首先将其与零样本学习、开放集识别和分布外检测等相关概念进行对比。接着,回顾了分割和检测场景下的若干密切相关任务,包括长尾问题、小样本和零样本设置。在方法综述部分,我们首先介绍封闭集检测和分割的基础知识作为预备知识,随后考察开放词汇学习的多种应用场景,识别常见设计要素与核心思想。然后,我们对比了常用数据集和基准上的最新检测与分割方法。最后,我们总结见解、问题并讨论了未来研究方向。据我们所知,这是开放词汇学习的首篇综合性文献综述。我们持续追踪相关成果于 https://github.com/jianzongwu/Awesome-Open-Vocabulary。