Grid-Centric Traffic Scenario Perception for Autonomous Driving: A Comprehensive Review

Grid-centric perception is a crucial field for mobile robot perception and navigation. Nonetheless, grid-centric perception is less prevalent than object-centric perception as autonomous vehicles need to accurately perceive highly dynamic, large-scale traffic scenarios and the complexity and computational costs of grid-centric perception are high. In recent years, the rapid development of deep learning techniques and hardware provides fresh insights into the evolution of grid-centric perception. The fundamental difference between grid-centric and object-centric pipeline lies in that grid-centric perception follows a geometry-first paradigm which is more robust to the open-world driving scenarios with endless long-tailed semantically-unknown obstacles. Recent researches demonstrate the great advantages of grid-centric perception, such as comprehensive fine-grained environmental representation, greater robustness to occlusion and irregular shaped objects, better ground estimation, and safer planning policies. There is also a growing trend that the capacity of occupancy networks are greatly expanded to 4D scene perception and prediction and latest techniques are highly related to new research topics such as 4D occupancy forecasting, generative AI and world models in the field of autonomous driving. Given the lack of current surveys for this rapidly expanding field, we present a hierarchically-structured review of grid-centric perception for autonomous vehicles. We organize previous and current knowledge of occupancy grid techniques along the main vein from 2D BEV grids to 3D occupancy to 4D occupancy forecasting. We additionally summarize label-efficient occupancy learning and the role of grid-centric perception in driving systems. Lastly, we present a summary of the current research trend and provide future outlooks.

翻译：网格中心感知是移动机器人感知与导航的关键领域。然而，由于自动驾驶车辆需要准确感知高度动态、大规模的交通场景，且网格中心感知的复杂性和计算成本较高，该领域的发展不如以物体为中心的感知方法普遍。近年来，深度学习技术和硬件的快速发展为网格中心感知的演进提供了新的视角。网格中心与物体中心感知流程的根本区别在于，网格中心感知遵循几何优先的范式，对于存在无限长尾语义未知障碍物的开放世界驾驶场景具有更强的鲁棒性。近期研究表明，网格中心感知具有显著优势，例如提供全面细粒度的环境表征、对遮挡和异形物体具有更强的鲁棒性、更优的地面估计能力以及更安全的规划策略。当前研究趋势显示，占据网络的感知能力已大幅扩展至4D场景感知与预测，最新技术高度关联自动驾驶领域的新兴研究方向，如4D占据预测、生成式人工智能与世界模型。鉴于目前缺乏对这一快速扩展领域的系统性综述，本文提出一种层次化结构的自动驾驶网格中心感知综述。我们沿着从2D鸟瞰图网格到3D占据再到4D占据预测的主脉络，系统梳理了占据网格技术的既有成果与最新进展。同时总结了标签高效的占据学习方法，并探讨了网格中心感知在驾驶系统中的角色。最后，本文归纳了当前研究趋势并展望了未来发展方向。