Estimating the 3D structure of the drivable surface and surrounding environment is a crucial task for assisted and autonomous driving. It is commonly solved either by using 3D sensors such as LiDAR or directly predicting the depth of points via deep learning. However, the former is expensive, and the latter lacks the use of geometry information for the scene. In this paper, instead of following existing methodologies, we propose Road Planar Parallax Attention Network (RPANet), a new deep neural network for 3D sensing from monocular image sequences based on planar parallax, which takes full advantage of the omnipresent road plane geometry in driving scenes. RPANet takes a pair of images aligned by the homography of the road plane as input and outputs a $\gamma$ map (the ratio of height to depth) for 3D reconstruction. The $\gamma$ map has the potential to construct a two-dimensional transformation between two consecutive frames. It implies planar parallax and can be combined with the road plane serving as a reference to estimate the 3D structure by warping the consecutive frames. Furthermore, we introduce a novel cross-attention module to make the network better perceive the displacements caused by planar parallax. To verify the effectiveness of our method, we sample data from the Waymo Open Dataset and construct annotations related to planar parallax. Comprehensive experiments are conducted on the sampled dataset to demonstrate the 3D reconstruction accuracy of our approach in challenging scenarios.
翻译:可行驶表面及周围环境的三维结构估计是辅助驾驶与自动驾驶的关键任务。目前常用方法包括使用激光雷达等三维传感器,或通过深度学习直接预测点云深度。然而前者成本高昂,后者缺乏对场景几何信息的利用。本文不沿用现有方法,提出基于平面视差的道路平面视差注意力网络(RPANet)——一种从单目图像序列进行三维感知的新型深度神经网络,该方法充分利用驾驶场景中普遍存在的路面几何特征。RPANet以经道路平面单应性对齐的图像对为输入,输出用于三维重建的γ图(高度与深度比值)。该γ图可构建连续帧间的二维变换关系,其隐含平面视差信息,能以路面为参考基准,通过连续帧变形实现三维结构估计。此外,我们引入新型交叉注意力模块,使网络更有效感知平面视差引起的位移。为验证方法有效性,我们从Waymo开放数据集中采样并构建平面视差相关标注,在采样数据集上开展综合实验,证明本方法在复杂场景下的三维重建精度。