Single-object tracking (SOT) on edge devices is a critical computer vision task, requiring accurate and continuous target localization across video frames under occlusion, distractor interference, and fast motion. However, recent state-of-the-art distractor-aware memory mechanisms are largely built on segmentation-based trackers and rely on mask prediction and attention-driven memory updates, which introduce substantial computational overhead and limit real-time deployment on resource-constrained hardware; meanwhile, lightweight trackers sustain high throughput but are prone to drift when visually similar distractors appear. To address these challenges, we propose EdgeDAM, a lightweight detection-guided tracking framework that reformulates distractor-aware memory for bounding-box tracking under strict edge constraints. EdgeDAM introduces two key strategies: (1) Dual-Buffer Distractor-Aware Memory (DAM), which integrates a Recent-Aware Memory to preserve temporally consistent target hypotheses and a Distractor-Resolving Memory to explicitly store hard negative candidates and penalize their re-selection during recovery; and (2) Confidence-Driven Switching with Held-Box Stabilization, where tracker reliability and temporal consistency criteria adaptively activate detection and memory-guided re-identification during occlusion, while a held-box mechanism temporarily freezes and expands the estimate to suppress distractor contamination. Extensive experiments on five benchmarks, including the distractor-focused DiDi dataset, demonstrate improved robustness under occlusion and fast motion while maintaining real-time performance on mobile devices, achieving 88.2% accuracy on DiDi and 25 FPS on an iPhone 15. Code will be released.
翻译:边缘设备上的单目标跟踪是一项关键的计算机视觉任务,需要在遮挡、干扰物影响及快速运动等条件下,跨视频帧实现准确且连续的目标定位。然而,当前最先进的干扰物感知记忆机制主要构建于基于分割的跟踪器之上,并依赖于掩码预测和注意力驱动的记忆更新,这带来了巨大的计算开销,限制了在资源受限硬件上的实时部署;与此同时,轻量级跟踪器虽能维持高吞吐量,但在出现视觉相似干扰物时易发生跟踪漂移。为应对这些挑战,我们提出了EdgeDAM,一个轻量级的检测引导跟踪框架,它在严格的边缘约束下,为边界框跟踪任务重构了干扰物感知记忆机制。EdgeDAM引入了两个关键策略:(1) 双缓冲干扰物感知记忆,它集成了一个近期感知记忆以保持时间一致的目标假设,以及一个干扰物解析记忆以显式存储困难负样本候选,并在恢复过程中惩罚其被重新选择;(2) 置信度驱动切换与保持框稳定化机制,该机制根据跟踪器可靠性和时间一致性准则,在遮挡期间自适应地激活检测和记忆引导的再识别,同时通过一个保持框机制临时冻结并扩展估计框以抑制干扰物污染。在包括专注于干扰物的DiDi数据集在内的五个基准测试上进行的大量实验表明,该方法在遮挡和快速运动下具有更强的鲁棒性,同时在移动设备上保持了实时性能,在DiDi数据集上达到88.2%的准确率,并在iPhone 15上实现25 FPS的帧率。代码将公开。