We introduce YOLO11-JDE, a fast and accurate multi-object tracking (MOT) solution that combines real-time object detection with self-supervised Re-Identification (Re-ID). By incorporating a dedicated Re-ID branch into YOLO11s, our model performs Joint Detection and Embedding (JDE), generating appearance features for each detection. The Re-ID branch is trained in a fully self-supervised setting while simultaneously training for detection, eliminating the need for costly identity-labeled datasets. The triplet loss, with hard positive and semi-hard negative mining strategies, is used for learning discriminative embeddings. Data association is enhanced with a custom tracking implementation that successfully integrates motion, appearance, and location cues. YOLO11-JDE achieves competitive results on MOT17 and MOT20 benchmarks, surpassing existing JDE methods in terms of FPS and using up to ten times fewer parameters. Thus, making our method a highly attractive solution for real-world applications.
翻译:我们提出了YOLO11-JDE,一种将实时目标检测与自监督重识别(Re-ID)相结合的快速准确多目标跟踪(MOT)解决方案。通过在YOLO11s中集成一个专用的Re-ID分支,我们的模型实现了联合检测与嵌入(JDE),为每个检测生成外观特征。该Re-ID分支在完全自监督的设置下进行训练,同时进行检测训练,从而无需成本高昂的身份标注数据集。我们采用三元组损失,并结合难正样本与半难负样本挖掘策略,以学习具有判别性的嵌入。通过一个定制的跟踪实现,成功融合了运动、外观和位置线索,从而增强了数据关联。YOLO11-JDE在MOT17和MOT20基准测试中取得了有竞争力的结果,在FPS方面超越了现有的JDE方法,并且使用的参数量减少了多达十倍。因此,我们的方法成为现实世界应用中极具吸引力的解决方案。