Visual Product Search Benchmark

Reliable product identification from images is a critical requirement in industrial and commercial applications, particularly in maintenance, procurement, and operational workflows where incorrect matches can lead to costly downstream failures. At the core of such systems lies the visual search component, which must retrieve and rank the exact object instance from large and continuously evolving catalogs under diverse imaging conditions. This report presents a structured benchmark of modern visual embedding models for instance-level image retrieval, with a focus on industrial applications. A curated set of open-source foundation embedding models, proprietary multi-modal embedding systems, and domain-specific vision-only models are evaluated under a unified image-to-image retrieval protocol. The benchmark includes curated datasets, which includes industrial datasets derived from production deployments in Manufacturing, Automotive, DIY, and Retail, as well as established public benchmarks. Evaluation is conducted without post-processing, isolating the retrieval capability of each model. The results provide insight into how well contemporary foundation and unified embedding models transfer to fine-grained instance retrieval tasks, and how they compare to models explicitly trained for industrial applications. By emphasizing realistic constraints, heterogeneous image conditions, and exact instance matching requirements, this benchmark aims to inform both practitioners and researchers about the strengths and limitations of current visual embedding approaches in production-level product identification systems. An interactive companion website presenting the benchmark results, evaluation details, and additional visualizations is available at https://benchmark.nyris.io.

翻译：从图像中可靠识别产品是工业和商业应用中的关键需求，尤其在维护、采购和运营工作流程中，错误匹配可能导致代价高昂的下游故障。此类系统的核心在于视觉搜索组件，其必须能够在多样化成像条件下，从持续演化的大型产品目录中检索并排序出完全相同的物体实例。本报告针对实例级图像检索任务，构建了一个结构化基准以评估现代视觉嵌入模型，重点关注工业应用场景。研究在统一的图像到图像检索协议下，评估了一系列精选的开源基础嵌入模型、专有多模态嵌入系统以及特定领域的纯视觉模型。基准包含多个精选数据集，涵盖源自制造、汽车、DIY和零售领域生产部署的工业数据集，以及已建立的公共基准。评估过程未采用后处理技术，以隔离各模型的检索能力。研究结果揭示了当代基础与统一嵌入模型在细粒度实例检索任务中的迁移表现，及其与针对工业应用显式训练模型的对比情况。通过强调实际约束条件、异构图像环境和精确实例匹配要求，本基准旨在为从业者和研究人员揭示当前视觉嵌入方法在生产级产品识别系统中的优势与局限。可通过 https://benchmark.nyris.io 访问交互式配套网站，获取基准结果、评估细节及补充可视化内容。