RoadFormer: Duplex Transformer for RGB-Normal Semantic Road Scene Parsing

The recent advancements in deep convolutional neural networks have shown significant promise in the domain of road scene parsing. Nevertheless, the existing works focus primarily on freespace detection, with little attention given to hazardous road defects that could compromise both driving safety and comfort. In this paper, we introduce RoadFormer, a novel Transformer-based data-fusion network developed for road scene parsing. RoadFormer utilizes a duplex encoder architecture to extract heterogeneous features from both RGB images and surface normal information. The encoded features are subsequently fed into a novel heterogeneous feature synergy block for effective feature fusion and recalibration. The pixel decoder then learns multi-scale long-range dependencies from the fused and recalibrated heterogeneous features, which are subsequently processed by a Transformer decoder to produce the final semantic prediction. Additionally, we release SYN-UDTIRI, the first large-scale road scene parsing dataset that contains over 10,407 RGB images, dense depth images, and the corresponding pixel-level annotations for both freespace and road defects of different shapes and sizes. Extensive experimental evaluations conducted on our SYN-UDTIRI dataset, as well as on three public datasets, including KITTI road, CityScapes, and ORFD, demonstrate that RoadFormer outperforms all other state-of-the-art networks for road scene parsing. Specifically, RoadFormer ranks first on the KITTI road benchmark. Our source code, created dataset, and demo video are publicly available at mias.group/RoadFormer.

翻译：深度卷积神经网络的最新进展已在道路场景解析领域展现出显著潜力。然而，现有工作主要聚焦于自由空间检测，对可能危及驾驶安全与舒适性的危险道路缺陷关注不足。本文提出RoadFormer——一种基于Transformer的新型数据融合网络，专用于道路场景解析。RoadFormer采用双工编码器架构，从RGB图像与表面法线信息中提取异构特征。编码后的特征随后被输入至新型异构特征协同模块，以实现有效的特征融合与重校准。像素解码器从融合与重校准后的异构特征中学习多尺度长程依赖关系，并进一步交由Transformer解码器生成最终语义预测。此外，我们发布了SYN-UDTIRI——首个包含超10,407张RGB图像、密集深度图像及对应像素级标注（涵盖不同形状与尺寸的自由空间及道路缺陷）的大规模道路场景解析数据集。基于SYN-UDTIRI数据集以及包括KITTI road、CityScapes和ORFD在内的三个公开数据集的广泛实验评估表明，RoadFormer在所有道路场景解析方法中性能最优。具体而言，RoadFormer在KITTI道路基准测试中排名第一。我们的源代码、创建的数据集及演示视频已公开于mias.group/RoadFormer。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日