A major bottleneck to scaling-up training of self-driving perception systems are the human annotations required for supervision. A promising alternative is to leverage "auto-labelling" offboard perception models that are trained to automatically generate annotations from raw LiDAR point clouds at a fraction of the cost. Auto-labels are most commonly generated via a two-stage approach -- first objects are detected and tracked over time, and then each object trajectory is passed to a learned refinement model to improve accuracy. Since existing refinement models are overly complex and lack advanced temporal reasoning capabilities, in this work we propose LabelFormer, a simple, efficient, and effective trajectory-level refinement approach. Our approach first encodes each frame's observations separately, then exploits self-attention to reason about the trajectory with full temporal context, and finally decodes the refined object size and per-frame poses. Evaluation on both urban and highway datasets demonstrates that LabelFormer outperforms existing works by a large margin. Finally, we show that training on a dataset augmented with auto-labels generated by our method leads to improved downstream detection performance compared to existing methods. Please visit the project website for details https://waabi.ai/labelformer
翻译:自主驾驶感知系统大规模训练的主要瓶颈在于监督训练所需的人工标注成本。一种有前景的替代方案是采用"自动标注"远距离感知模型——该类模型经过训练可从原始LiDAR点云中自动生成标注,成本仅为人工标注的零头。自动标注通常通过两阶段流程生成:首先检测并跟踪目标轨迹,随后将每条目标轨迹输入学习型精化模型以提升精度。现有精化模型结构过于复杂且缺乏先进的时序推理能力,为此本文提出LabelFormer——一种简洁、高效且有效的轨迹级精化方法。该方法首先独立编码每帧观测信息,继而利用自注意力机制在完整时序上下文中推理轨迹特征,最终解码生成精化的目标尺寸与逐帧位姿。在城市与高速公路数据集上的评估表明,LabelFormer在性能上大幅超越现有方法。最后,实验证明使用本方法生成的自动标注扩充训练数据集后,下游检测任务性能相较现有方法获得显著提升。项目详情请访问 https://waabi.ai/labelformer