Conventional tracking paradigm takes in instantaneous measurements such as range and bearing, and produces object tracks across time. In applications such as autonomous driving, lidar measurements in the form of point clouds are usually passed through a "virtual sensor" realized by a deep learning model, to produce "measurements" such as bounding boxes, which are in turn ingested by a tracking module to produce object tracks. Very often multiple lidar sweeps are accumulated in a buffer to merge and become the input to the virtual sensor. We argue in this paper that such an input already contains temporal information, and therefore the virtual sensor output should also contain temporal information, not just instantaneous values for the time corresponding to the end of the buffer. In particular, we present the deep learning model called MULti-Sweep PAired Detector (MULSPAD) that produces, for each detected object, a pair of bounding boxes at both the end time and the beginning time of the input buffer. This is achieved with fairly straightforward changes in commonly used lidar detection models, and with only marginal extra processing, but the resulting symmetry is satisfying. Such paired detections make it possible not only to construct rudimentary trackers fairly easily, but also to construct more sophisticated trackers that can exploit the extra information conveyed by the pair and be robust to choices of motion models and object birth/death models. We have conducted preliminary training and experimentation using Waymo Open Dataset, which shows the efficacy of our proposed method.
翻译:传统跟踪范式接收距离、方位等瞬时测量值,并随时间推移生成目标轨迹。在自动驾驶等应用中,以点云形式呈现的激光雷达测量值通常通过深度学习模型实现的"虚拟传感器"处理,生成边界框等"测量值",再由跟踪模块摄取以生成目标轨迹。实践中,多个激光雷达扫描数据常被存入缓冲区合并后作为虚拟传感器输入。本文论证该输入已包含时序信息,因此虚拟传感器输出也应包含时序信息,而不仅是缓冲区末端时刻的瞬时值。具体而言,我们提出名为多扫描配对检测器(MULSPAD)的深度学习模型,可为每个检测对象在输入缓冲区末端时刻与起始时刻分别生成一对边界框。该实现仅需对常用激光雷达检测模型进行简单改动且计算开销极小,却可获得令人满意的对称性。这种成对检测不仅能够轻松构建基础跟踪器,还能构建更复杂的跟踪器以利用成对检测传递的额外信息,并增强对运动模型及目标生灭模型选择的鲁棒性。我们基于Waymo开放数据集开展了初步训练与实验,验证了所提方法的有效性。