The TrackNet series has established a strong baseline for fast-moving small object tracking in sports. However, existing iterations face significant limitations: V1-V3 struggle with occlusions due to a reliance on purely visual cues, while TrackNetV4, despite introducing motion inputs, suffers from directional ambiguity as its absolute difference method discards motion polarity. To overcome these bottlenecks, we propose TrackNetV5, a robust architecture integrating two novel mechanisms. First, to recover lost directional priors, we introduce the Motion Direction Decoupling (MDD) module. Unlike V4, MDD decomposes temporal dynamics into signed polarity fields, explicitly encoding both movement occurrence and trajectory direction. Second, we propose the Residual-Driven Spatio-Temporal Refinement (R-STR) head. Operating on a coarse-to-fine paradigm, this Transformer-based module leverages factorized spatio-temporal contexts to estimate a corrective residual, effectively recovering occluded targets. Extensive experiments on the TrackNetV2 dataset demonstrate that TrackNetV5 achieves a new state-of-the-art F1-score of 0.9859 and an accuracy of 0.9733, significantly outperforming previous versions. Notably, this performance leap is achieved with a marginal 3.7% increase in FLOPs compared to V4, maintaining real-time inference capabilities while delivering superior tracking precision.
翻译:TrackNet系列已在体育场景中的快速运动小目标跟踪领域建立了坚实的基准。然而,现有版本存在显著局限:V1-V3因依赖纯视觉线索而在遮挡处理上表现不佳;TrackNetV4虽引入了运动输入,但其绝对值差分方法丢失了运动极性,导致方向模糊性。为突破这些瓶颈,我们提出TrackNetV5——一种集成两种新颖机制的鲁棒架构。首先,为恢复丢失的方向先验,我们引入运动方向解耦(MDD)模块。与V4不同,MDD将时序动态分解为带符号的极性场,显式编码运动发生与轨迹方向。其次,我们提出残差驱动的时空精细化(R-STR)头部。该基于Transformer的模块采用由粗到精的范式,利用分解的时空上下文估计校正残差,有效恢复被遮挡目标。在TrackNetV2数据集上的大量实验表明,TrackNetV5实现了0.9859的F1分数与0.9733的准确率,刷新了当前最优性能,显著超越先前版本。值得注意的是,这一性能飞跃仅需比V4增加3.7%的FLOPs,在保持实时推理能力的同时实现了更优的跟踪精度。