In the field of visual scene understanding, deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection. However, most approaches operate on the close-set assumption, meaning that the model can only identify pre-defined categories that are present in the training set. Recently, open vocabulary settings were proposed due to the rapid progress of vision language pre-training. These new approaches seek to locate and recognize categories beyond the annotated label space. The open vocabulary approach is more general, practical, and effective compared to weakly supervised and zero-shot settings. This paper provides a thorough review of open vocabulary learning, summarizing and analyzing recent developments in the field. In particular, we begin by comparing it to related concepts such as zero-shot learning, open-set recognition, and out-of-distribution detection. Then, we review several closely related tasks in the case of segmentation and detection, including long-tail problems, few-shot, and zero-shot settings. For the method survey, we first present the basic knowledge of detection and segmentation in close-set as the preliminary knowledge. Next, we examine various scenarios in which open vocabulary learning is used, identifying common design elements and core ideas. Then, we compare the recent detection and segmentation approaches in commonly used datasets and benchmarks. Finally, we conclude with insights, issues, and discussions regarding future research directions. To our knowledge, this is the first comprehensive literature review of open vocabulary learning. We keep tracing related works at https://github.com/jianzongwu/Awesome-Open-Vocabulary.
翻译:在视觉场景理解领域,深度神经网络在分割、跟踪与检测等核心任务上取得了显著进展。然而,多数方法基于封闭集假设运行,即模型仅能识别训练集中存在的预定义类别。近年来,得益于视觉语言预训练的快速发展,开放词汇设置被提出。此类方法旨在定位并识别超出标注标签空间的类别。与弱监督和零样本设置相比,开放词汇方法在通用性、实用性和有效性上更具优势。本文对开放词汇学习进行系统性综述,总结并分析该领域的最新进展。具体而言,我们首先将其与零样本学习、开放集识别及分布外检测等相关概念进行对比。随后,在分割与检测场景下回顾若干紧密相关的任务,包括长尾问题、小样本及零样本设置。方法综述部分,我们首先介绍封闭集检测与分割的基础知识作为预备内容。接着,考察开放词汇学习应用的多类场景,提炼其通用设计要素与核心思想。进而,对比近期方法在常用数据集与基准上的表现。最后,围绕未来研究方向给出见解、问题与讨论。据我们所知,这是首个关于开放词汇学习的全面文献综述。我们持续追踪相关研究于https://github.com/jianzongwu/Awesome-Open-Vocabulary。