Computer vision (CV) pipelines are typically evaluated on datasets processed by image signal processing (ISP) pipelines even though, for resource-constrained applications, an important research goal is to avoid as many ISP steps as possible. In particular, most CV datasets consist of global shutter (GS) images even though most cameras today use a rolling shutter (RS). This paper studies the impact of different shutter mechanisms on machine learning (ML) object detection models on a synthetic dataset that we generate using the advanced simulation capabilities of Unreal Engine 5 (UE5). In particular, we train and evaluate mainstream detection models with our synthetically-generated paired GS and RS datasets to ascertain whether there exists a significant difference in detection accuracy between these two shutter modalities, especially when capturing low-speed objects (e.g., pedestrians). The results of this emulation framework indicate the performance between them are remarkably congruent for coarse-grained detection (mean average precision (mAP) for IOU=0.5), but have significant differences for fine-grained measures of detection accuracy (mAP for IOU=0.5:0.95). This implies that ML pipelines might not need explicit correction for RS for many object detection applications, but mitigating RS effects in ISP-less ML pipelines that target fine-grained location of the objects may need additional research.
翻译:计算机视觉(CV)流水线通常基于经过图像信号处理(ISP)流水线处理的数据集进行评估,然而在资源受限的应用中,一个重要研究目标是尽可能避免ISP步骤。特别是,大多数CV数据集由全局快门(GS)图像组成,而如今大多数相机使用滚动快门(RS)。本文研究不同快门机制对机器学习(ML)目标检测模型的影响,基于我们利用虚幻引擎5(UE5)先进仿真能力生成的合成数据集。具体而言,我们通过合成生成的配对GS和RS数据集训练并评估主流检测模型,以确定这两种快门模式在检测精度上是否存在显著差异,尤其是在捕捉低速物体(如行人)时。该仿真框架的结果表明,对于粗粒度检测(交并比IOU=0.5的平均精度mAP),二者性能高度一致,但在细粒度检测精度(IOU=0.5:0.95的mAP)上存在显著差异。这意味着对于许多目标检测应用,ML流水线可能无需显式校正RS效应,但在无ISP的ML流水线中,针对目标精确定位的RS效应缓解仍需进一步研究。