Over the past few years, there has been growing interest in developing a broad, universal, and general-purpose computer vision system. Such systems have the potential to address a wide range of vision tasks simultaneously, without being limited to specific problems or data domains. This universality is crucial for practical, real-world computer vision applications. In this study, our focus is on a specific challenge: the large-scale, multi-domain universal object detection problem, which contributes to the broader goal of achieving a universal vision system. This problem presents several intricate challenges, including cross-dataset category label duplication, label conflicts, and the necessity to handle hierarchical taxonomies. To address these challenges, we introduce our approach to label handling, hierarchy-aware loss design, and resource-efficient model training utilizing a pre-trained large vision model. Our method has demonstrated remarkable performance, securing a prestigious second-place ranking in the object detection track of the Robust Vision Challenge 2022 (RVC 2022) on a million-scale cross-dataset object detection benchmark. We believe that our comprehensive study will serve as a valuable reference and offer an alternative approach for addressing similar challenges within the computer vision community. The source code for our work is openly available at https://github.com/linfeng93/Large-UniDet.
翻译:近年来,构建一个广泛、通用且多功能的计算机视觉系统引起了越来越多的关注。此类系统有望同时解决多种视觉任务,而不受限于特定问题或数据领域。这种通用性对于实际、真实的计算机视觉应用至关重要。在本研究中,我们聚焦于一个特定挑战:大规模、多领域的通用目标检测问题,这有助于实现通用视觉系统的更广泛目标。该问题呈现出若干复杂的挑战,包括跨数据集类别标签重复、标签冲突,以及处理层次化分类法的必要性。为应对这些挑战,我们引入了标签处理方法、层次感知损失设计,以及利用预训练大型视觉模型进行资源高效模型训练的策略。我们的方法表现出色,在面向百万级跨数据集目标检测基准的鲁棒视觉挑战赛2022(RVC 2022)目标检测赛道中,荣获第二名。我们相信,这项全面研究将为计算机视觉社区应对类似挑战提供有价值的参考和替代方案。本工作的源代码已在 https://github.com/linfeng93/Large-UniDet 公开。