Autonomous driving systems require a quick and robust perception of the nearby environment to carry out their routines effectively. With the aim to avoid collisions and drive safely, autonomous driving systems rely heavily on object detection. However, 2D object detections alone are insufficient; more information, such as relative velocity and distance, is required for safer planning. Monocular 3D object detectors try to solve this problem by directly predicting 3D bounding boxes and object velocities given a camera image. Recent research estimates time-to-contact in a per-pixel manner and suggests that it is more effective measure than velocity and depth combined. However, per-pixel time-to-contact requires object detection to serve its purpose effectively and hence increases overall computational requirements as two different models need to run. To address this issue, we propose per-object time-to-contact estimation by extending object detection models to additionally predict the time-to-contact attribute for each object. We compare our proposed approach with existing time-to-contact methods and provide benchmarking results on well-known datasets. Our proposed approach achieves higher precision compared to prior art while using a single image.
翻译:自动驾驶系统需要快速且鲁棒地感知周围环境,以有效执行其常规任务。为了避免碰撞并安全驾驶,自动驾驶系统高度依赖目标检测。然而,仅凭二维目标检测是不够的;要实现更安全的规划,还需要相对速度和距离等更多信息。单目三维目标检测器试图通过从相机图像中直接预测三维边界框和目标速度来解决这一问题。近期研究以逐像素方式估计碰撞时间,并表明其比速度和深度结合的度量更有效。然而,逐像素碰撞时间需要目标检测才能有效发挥作用,因此需要运行两个不同模型,从而增加了整体计算需求。为解决这一问题,我们提出逐目标碰撞时间估计方法,通过扩展目标检测模型,使其额外预测每个目标的碰撞时间属性。我们将所提出的方法与现有碰撞时间方法进行比较,并在知名数据集上提供基准测试结果。与现有技术相比,我们的方法在使用单张图像时实现了更高精度。