Despite significant progress in optical character recognition (OCR) and computer vision systems, robustly recognizing text and identifying people in images taken in unconstrained \emph{in-the-wild} environments remain an ongoing challenge. However, such obstacles must be overcome in practical applications of vision systems, such as identifying racers in photos taken during off-road racing events. To this end, we introduce two new challenging real-world datasets - the off-road motorcycle Racer Number Dataset (RND) and the Muddy Racer re-iDentification Dataset (MUDD) - to highlight the shortcomings of current methods and drive advances in OCR and person re-identification (ReID) under extreme conditions. These two datasets feature over 6,300 images taken during off-road competitions which exhibit a variety of factors that undermine even modern vision systems, namely mud, complex poses, and motion blur. We establish benchmark performance on both datasets using state-of-the-art models. Off-the-shelf models transfer poorly, reaching only 15% end-to-end (E2E) F1 score on text spotting, and 33% rank-1 accuracy on ReID. Fine-tuning yields major improvements, bringing model performance to 53% F1 score for E2E text spotting and 79% rank-1 accuracy on ReID, but still falls short of good performance. Our analysis exposes open problems in real-world OCR and ReID that necessitate domain-targeted techniques. With these datasets and analysis of model limitations, we aim to foster innovations in handling real-world conditions like mud and complex poses to drive progress in robust computer vision. All data was sourced from PerformancePhoto.co, a website used by professional motorsports photographers, racers, and fans. The top-performing text spotting and ReID models are deployed on this platform to power real-time race photo search.
翻译:尽管光学字符识别(OCR)与计算机视觉系统已取得显著进展,但在非受控自然环境中拍摄的图像中稳健识别文字和人员身份仍是一项持续性挑战。然而,在实际视觉系统应用中——例如识别越野赛事照片中的赛车手——这类障碍必须被克服。为此,我们提出两个具有挑战性的真实世界新数据集:越野摩托车赛车手编号数据集(RND)与泥泞赛车手重识别数据集(MUDD),旨在揭示现有方法的不足并推动极端条件下OCR和行人重识别(ReID)技术的发展。这两个数据集包含超过6,300张越野赛事照片,呈现了多种连现代视觉系统也难以应对的干扰因素,包括泥污、复杂姿态及运动模糊。我们采用当前最优模型在两个数据集上建立了基准性能:现成模型迁移效果差,端到端(E2E)文本检测F1分数仅达15%,ReID首位命中率仅33%;微调后性能显著提升,E2E文本检测F1分数达到53%,ReID首位命中率达到79%,但仍未达到理想性能。我们的分析揭示了真实场景中OCR与ReID存在的开放性问题,亟需领域针对性技术。通过提供这些数据集及模型局限性分析,我们旨在推动应对泥污、复杂姿态等真实环境挑战的技术创新,促进鲁棒计算机视觉的发展。所有数据源自专业赛车摄影平台PerformancePhoto.co,该网站服务于职业赛车摄影师、车手及爱好者。当前最优的文本检测与ReID模型已部署于该平台,用于实现实时赛事照片搜索。