The point cloud based 3D single object tracking has drawn increasing attention. Although many breakthroughs have been achieved, we also reveal two severe issues. By extensive analysis, we find the prediction manner of current approaches is non-robust, i.e., exposing a misalignment gap between prediction score and actually localization accuracy. Another issue is the sparse point returns will damage the feature matching procedure of the SOT task. Based on these insights, we introduce two novel modules, i.e., Adaptive Refine Prediction (ARP) and Target Knowledge Transfer (TKT), to tackle them, respectively. To this end, we first design a strong pipeline to extract discriminative features and conduct the matching with the attention mechanism. Then, ARP module is proposed to tackle the misalignment issue by aggregating all predicted candidates with valuable clues. Finally, TKT module is designed to effectively overcome incomplete point cloud due to sparse and occlusion issues. We call our overall framework PCET. By conducting extensive experiments on the KITTI and Waymo Open Dataset, our model achieves state-of-the-art performance while maintaining a lower computational cost.
翻译:基于点云的三维单目标跟踪已引起越来越多的关注。尽管取得了许多突破,但我们也揭示出两个严重问题。通过广泛分析,我们发现当前方法的预测方式缺乏鲁棒性,即预测分数与实际定位精度之间存在偏差。另一个问题是稀疏点返回会损害单目标跟踪任务的特征匹配过程。基于这些发现,我们引入了两个新颖模块,即自适应精炼预测(ARP)和目标知识迁移(TKT),分别解决上述问题。为此,我们首先设计了一个强大的流水线来提取判别性特征,并通过注意力机制进行匹配。接着,提出ARP模块,通过聚合所有含有价值线索的预测候选项来应对不匹配问题。最后,设计TKT模块,有效克服因稀疏和遮挡导致的点云不完整问题。我们将整体框架命名为PCET。通过在KITTI和Waymo Open Dataset上进行大量实验,我们的模型在保持较低计算成本的同时达到了最先进的性能。