Bird's Eye View (BEV) representations are tremendously useful for perception-related automated driving tasks. However, generating BEVs from surround-view fisheye camera images is challenging due to the strong distortions introduced by such wide-angle lenses. We take the first step in addressing this challenge and introduce a baseline, F2BEV, to generate BEV height maps and semantic segmentation maps from fisheye images. F2BEV consists of a distortion-aware spatial cross attention module for querying and consolidating spatial information from fisheye image features in a transformer-style architecture followed by a task-specific head. We evaluate single-task and multi-task variants of F2BEV on our synthetic FB-SSEM dataset, all of which generate better BEV height and segmentation maps (in terms of the IoU) than a state-of-the-art BEV generation method operating on undistorted fisheye images. We also demonstrate height map generation from real-world fisheye images using F2BEV. An initial sample of our dataset is publicly available at https://tinyurl.com/58jvnscy
翻译:鸟瞰图(BEV)表示在自动驾驶感知任务中极为有用。然而,由于广角镜头引入的强烈畸变,从环绕鱼眼相机图像生成BEV具有挑战性。我们首次尝试解决这一挑战,并引入基线方法F2BEV,用于从鱼眼图像生成BEV高度图和语义分割图。F2BEV包含一个畸变感知空间交叉注意力模块,其采用Transformer风格架构从鱼眼图像特征中查询和整合空间信息,随后接任务特定预测头。我们在合成的FB-SSEM数据集上评估了F2BEV的单任务及多任务变体,相较于在无畸变鱼眼图像上运行的现有最优BEV生成方法,所有变体均生成了更优的BEV高度图和分割图(以IoU衡量)。我们还展示了利用F2BEV从真实鱼眼图像生成高度图的能力。数据集初始样本已公开发布于https://tinyurl.com/58jvnscy。