Predicting how the world can evolve in the future is crucial for motion planning in autonomous systems. Classical methods are limited because they rely on costly human annotations in the form of semantic class labels, bounding boxes, and tracks or HD maps of cities to plan their motion and thus are difficult to scale to large unlabeled datasets. One promising self-supervised task is 3D point cloud forecasting from unannotated LiDAR sequences. We show that this task requires algorithms to implicitly capture (1) sensor extrinsics (i.e., the egomotion of the autonomous vehicle), (2) sensor intrinsics (i.e., the sampling pattern specific to the particular LiDAR sensor), and (3) the shape and motion of other objects in the scene. But autonomous systems should make predictions about the world and not their sensors. To this end, we factor out (1) and (2) by recasting the task as one of spacetime (4D) occupancy forecasting. But because it is expensive to obtain ground-truth 4D occupancy, we render point cloud data from 4D occupancy predictions given sensor extrinsics and intrinsics, allowing one to train and test occupancy algorithms with unannotated LiDAR sequences. This also allows one to evaluate and compare point cloud forecasting algorithms across diverse datasets, sensors, and vehicles.
翻译:预测世界未来如何演化对于自主系统中的运动规划至关重要。经典方法存在局限性,因为它们依赖昂贵的、以语义类别标签、边界框、轨迹或城市高清地图等形式的人工标注进行运动规划,因此难以扩展到大规模未标注数据集。一个较有前景的自监督任务是利用无标注激光雷达序列进行3D点云预测。我们证明,该任务要求算法隐式地捕捉:(1) 传感器外部参数(即自主车辆的自运动),(2) 传感器内部参数(即特定激光雷达传感器独有的采样模式),以及(3) 场景中其他物体的形状与运动。然而,自主系统应预测的是世界,而非其传感器。为此,我们将(1)和(2)因素分离,将任务重新定义为时空(4D)占用预测。但由于获取真实4D占用数据成本高昂,我们根据传感器外部参数和内部参数,从4D占用预测中渲染出点云数据,从而允许利用无标注激光雷达序列训练和测试占用算法。这也使得人们能够跨不同数据集、传感器和车辆评估并比较点云预测算法。