Urban traffic environments present unique challenges for object detection, particularly with the increasing presence of micromobility vehicles like e-scooters and bikes. To address this object detection problem, this work introduces an adapted detection model that combines the accuracy and speed of single-frame object detection with the richer features offered by video object detection frameworks. This is done by applying aggregated feature maps from consecutive frames processed through motion flow to the YOLOX architecture. This fusion brings a temporal perspective to YOLOX detection abilities, allowing for a better understanding of urban mobility patterns and substantially improving detection reliability. Tested on a custom dataset curated for urban micromobility scenarios, our model showcases substantial improvement over existing state-of-the-art methods, demonstrating the need to consider spatio-temporal information for detecting such small and thin objects. Our approach enhances detection in challenging conditions, including occlusions, ensuring temporal consistency, and effectively mitigating motion blur.
翻译:城市交通环境为目标检测带来了独特的挑战,尤其是在电动滑板车和自行车等微出行车辆日益增多的情况下。为解决这一目标检测问题,本研究引入了一种改进的检测模型,该模型结合了单帧目标检测的精度与速度以及视频目标检测框架所提供的更丰富特征。这是通过将经过运动流处理的连续帧的聚合特征图应用于YOLOX架构来实现的。这种融合为YOLOX的检测能力带来了时间维度视角,有助于更好地理解城市出行模式,并显著提高了检测可靠性。在为城市微出行场景定制的数据集上进行测试后,我们的模型相较于现有最先进方法展现出显著改进,证明了检测此类细小物体时考虑时空信息的必要性。我们的方法增强了在具有挑战性条件下的检测性能,包括遮挡处理、确保时间一致性以及有效缓解运动模糊。