Instance-level recognition (ILR) focuses on identifying individual objects rather than broad categories, offering the highest granularity in image classification. However, this fine-grained nature makes creating large-scale annotated datasets challenging, limiting ILR's real-world applicability across domains. To overcome this, we introduce a novel approach that synthetically generates diverse object instances from multiple domains under varied conditions and backgrounds, forming a large-scale training set. Unlike prior work on automatic data synthesis, our method is the first to address ILR-specific challenges without relying on any real images. Fine-tuning foundation vision models on the generated data significantly improves retrieval performance across seven ILR benchmarks spanning multiple domains. Our approach offers a new, efficient, and effective alternative to extensive data collection and curation, introducing a new ILR paradigm where the only input is the names of the target domains, unlocking a wide range of real-world applications.
翻译:实例级识别(ILR)关注于识别个体对象而非宽泛类别,为图像分类提供了最高粒度。然而,这种细粒度特性使得创建大规模标注数据集具有挑战性,限制了ILR在不同领域中的实际应用。为克服这一局限,我们提出了一种新颖方法,能够在多种条件和背景下从多个领域合成生成多样化的对象实例,从而构建大规模训练集。与先前自动数据合成的研究不同,我们的方法是首个无需依赖任何真实图像即可解决ILR特有挑战的方案。基于生成数据对基础视觉模型进行微调,显著提升了跨七个涵盖多领域的ILR基准测试的检索性能。我们的方法为大规模数据收集与整理提供了一种全新、高效且有效的替代方案,引入了一种仅需目标领域名称作为输入的ILR新范式,为广泛的实际应用开辟了道路。