Urban traffic environments present unique challenges for object detection, particularly with the increasing presence of micromobility vehicles like e-scooters and bikes. To address this object detection problem, this work introduces an adapted detection model that combines the accuracy and speed of single-frame object detection with the richer features offered by video object detection frameworks. This is done by applying aggregated feature maps from consecutive frames processed through motion flow to the YOLOX architecture. This fusion brings a temporal perspective to YOLOX detection abilities, allowing for a better understanding of urban mobility patterns and substantially improving detection reliability. Tested on a custom dataset curated for urban micromobility scenarios, our model showcases substantial improvement over existing state-of-the-art methods, demonstrating the need to consider spatio-temporal information for detecting such small and thin objects. Our approach enhances detection in challenging conditions, including occlusions, ensuring temporal consistency, and effectively mitigating motion blur.
翻译:城市交通环境对目标检测提出了独特挑战,尤其是随着电动滑板车和自行车等微型移动车辆日益普及。为解决这一目标检测问题,本文提出了一种改进的检测模型,该模型结合了单帧目标检测的准确性与速度,以及视频目标检测框架提供的更丰富特征。具体方法是将连续帧通过运动流处理的聚合特征图应用于YOLOX架构。这种融合为YOLOX的检测能力引入了时间维度,从而能够更好地理解城市移动模式,并显著提升检测可靠性。在专为城市微型移动场景定制的数据集上进行测试,我们的模型相较于现有最先进方法展现了显著改进,证明在检测此类细小目标时需考虑时空信息。该方法增强了遮挡等挑战性条件下的检测性能,确保了时间一致性,并有效减轻了运动模糊的影响。