With the advance of AI, road object detection has been a prominent topic in computer vision, mostly using perspective cameras. Fisheye lens provides omnidirectional wide coverage for using fewer cameras to monitor road intersections, however with view distortions. To our knowledge, there is no existing open dataset prepared for traffic surveillance on fisheye cameras. This paper introduces an open FishEye8K benchmark dataset for road object detection tasks, which comprises 157K bounding boxes across five classes (Pedestrian, Bike, Car, Bus, and Truck). In addition, we present benchmark results of State-of-The-Art (SoTA) models, including variations of YOLOv5, YOLOR, YOLO7, and YOLOv8. The dataset comprises 8,000 images recorded in 22 videos using 18 fisheye cameras for traffic monitoring in Hsinchu, Taiwan, at resolutions of 1080$\times$1080 and 1280$\times$1280. The data annotation and validation process were arduous and time-consuming, due to the ultra-wide panoramic and hemispherical fisheye camera images with large distortion and numerous road participants, particularly people riding scooters. To avoid bias, frames from a particular camera were assigned to either the training or test sets, maintaining a ratio of about 70:30 for both the number of images and bounding boxes in each class. Experimental results show that YOLOv8 and YOLOR outperform on input sizes 640$\times$640 and 1280$\times$1280, respectively. The dataset will be available on GitHub with PASCAL VOC, MS COCO, and YOLO annotation formats. The FishEye8K benchmark will provide significant contributions to the fisheye video analytics and smart city applications.
翻译:随着人工智能的发展,道路目标检测已成为计算机视觉领域的重要课题,主要使用透视相机。鱼眼镜头通过减少相机数量即可实现全景覆盖以监测道路交叉口,但存在视角畸变。据我们所知,目前尚无面向鱼眼相机交通监控的公开数据集。本文提出面向道路目标检测任务的开放鱼眼8K基准数据集,包含五类目标(行人、自行车、汽车、公交车、卡车)的15.7万个边界框。此外,我们展示了包括YOLOv5、YOLOR、YOLO7和YOLOv8变体在内的最先进模型的基准测试结果。该数据集包含22段视频中提取的8000张图像,由中国台湾新竹市18台鱼眼相机以1080×1080和1280×1280分辨率采集。由于超广角全景半球形鱼眼相机图像存在严重畸变且包含大量道路参与者(尤其是骑摩托车者),数据标注与验证过程耗时耗力。为避免偏差,特定相机拍摄的帧被分配至训练集或测试集,保持各类别图像数及边界框数约70:30的比例。实验结果表明,YOLOv8和YOLOR分别在输入尺寸640×640和1280×1280时表现最优。该数据集将以PASCAL VOC、MS COCO和YOLO标注格式在GitHub上发布。鱼眼8K基准将为鱼眼视频分析与智慧城市应用提供重要贡献。