Quantitative metrics are central to evaluating computer vision (CV) models, but they often fail to capture real-world performance due to protocol inconsistencies and ground-truth noise. While visual perception studies can complement these metrics, they often require end-to-end systems that are time-consuming to implement and setups that are difficult to reproduce. We systematically summarize key challenges in evaluating CV models and present the design of ARCADE, an evaluation platform that leverages augmented reality (AR) to enable easy, reproducible, and human-centered CV evaluation. ARCADE uses a modular architecture that provides cross-platform data collection, pluggable model inference, and interactive AR tasks, supporting both metric and visual perception evaluation. We demonstrate ARCADE through a user study with 15 participants and case studies on two representative CV tasks, depth and lighting estimation, showing that ARCADE can reveal perceptual flaws in model quality that are often missed by traditional metrics. We also evaluate ARCADE's usability and performance, showing its flexibility as a reliable real-time platform.
翻译:量化指标是评估计算机视觉模型的核心,但由于协议不一致和真实标注噪声,这些指标往往无法反映模型在真实场景中的性能。虽然视觉感知研究能够补充这些指标,但其通常需要构建耗时且难以复现的端到端系统。本文系统总结了评估计算机视觉模型面临的关键挑战,并提出了ARCADE平台的设计方案——该平台利用增强现实技术,构建了一个易于使用、可复现且以人为中心的计算机视觉评估环境。ARCADE采用模块化架构,提供跨平台数据采集、可插拔模型推理和交互式AR任务,同时支持指标评估与视觉感知评估。我们通过一项包含15名参与者的用户研究,以及针对深度估计和光照估计这两个代表性计算机视觉任务的案例研究,验证了ARCADE平台的有效性。实验表明,ARCADE能够揭示传统指标常忽略的模型感知缺陷。我们还评估了ARCADE的可用性与性能,证明其作为可靠实时评估平台具有高度灵活性。