Siamese network has been a de facto benchmark framework for 3D LiDAR object tracking with a shared-parametric encoder extracting features from template and search region, respectively. This paradigm relies heavily on an additional matching network to model the cross-correlation/similarity of the template and search region. In this paper, we forsake the conventional Siamese paradigm and propose a novel single-branch framework, SyncTrack, synchronizing the feature extracting and matching to avoid forwarding encoder twice for template and search region as well as introducing extra parameters of matching network. The synchronization mechanism is based on the dynamic affinity of the Transformer, and an in-depth analysis of the relevance is provided theoretically. Moreover, based on the synchronization, we introduce a novel Attentive Points-Sampling strategy into the Transformer layers (APST), replacing the random/Farthest Points Sampling (FPS) method with sampling under the supervision of attentive relations between the template and search region. It implies connecting point-wise sampling with the feature learning, beneficial to aggregating more distinctive and geometric features for tracking with sparse points. Extensive experiments on two benchmark datasets (KITTI and NuScenes) show that SyncTrack achieves state-of-the-art performance in real-time tracking.
翻译:孪生网络已成为3D LiDAR目标跟踪的事实基准框架,其通过共享参数的编码器分别从模板和搜索区域提取特征。这种范式严重依赖额外的匹配网络来建模模板与搜索区域的交叉相关性/相似度。本文摒弃传统的孪生范式,提出一种新颖的单分支框架SyncTrack,通过同步特征提取与匹配,避免对模板和搜索区域进行两次编码器前向传播,同时避免引入匹配网络的额外参数。该同步机制基于Transformer的动态亲和性,并从理论上深入分析了其相关性。此外,基于同步机制,我们在Transformer层中引入新颖的注意力点采样策略(APST),利用模板与搜索区域之间的注意力关系监督采样过程,替代随机/最远点采样(FPS)方法。这意味着将逐点采样与特征学习相连接,有助于为稀疏点云跟踪聚合更具区分性和几何特征的表现。在KITTI和NuScenes两个基准数据集上的大量实验表明,SyncTrack在实时跟踪中达到了最先进的性能。