RoadFormer: Duplex Transformer for RGB-Normal Semantic Road Scene Parsing

The recent advancements in deep convolutional neural networks have shown significant promise in the domain of road scene parsing. Nevertheless, the existing works focus primarily on freespace detection, with little attention given to hazardous road defects that could compromise both driving safety and comfort. In this paper, we introduce RoadFormer, a novel Transformer-based data-fusion network developed for road scene parsing. RoadFormer utilizes a duplex encoder architecture to extract heterogeneous features from both RGB images and surface normal information. The encoded features are subsequently fed into a novel heterogeneous feature synergy block for effective feature fusion and recalibration. The pixel decoder then learns multi-scale long-range dependencies from the fused and recalibrated heterogeneous features, which are subsequently processed by a Transformer decoder to produce the final semantic prediction. Additionally, we release SYN-UDTIRI, the first large-scale road scene parsing dataset that contains over 10,407 RGB images, dense depth images, and the corresponding pixel-level annotations for both freespace and road defects of different shapes and sizes. Extensive experimental evaluations conducted on our SYN-UDTIRI dataset, as well as on three public datasets, including KITTI road, CityScapes, and ORFD, demonstrate that RoadFormer outperforms all other state-of-the-art networks for road scene parsing. Specifically, RoadFormer ranks first on the KITTI road benchmark. Our source code, created dataset, and demo video are publicly available at mias.group/RoadFormer.

翻译：深度卷积神经网络的近期进展在道路场景解析领域展现出显著潜力。然而，现有研究主要聚焦于自由空间检测，极少关注可能危及行车安全与舒适性的危险道路缺陷。本文提出RoadFormer——一种基于Transformer的新型数据融合网络，专门用于道路场景解析。RoadFormer采用双工编码器架构，从RGB图像与表面法向信息中提取异构特征。编码后的特征随后输入至新型异构特征协同模块，实现有效的特征融合与重校准。像素解码器从融合并重校准后的异构特征中学习多尺度长程依赖关系，继而由Transformer解码器处理以生成最终语义预测。此外，我们发布了SYN-UDTIRI——首个包含10407余张RGB图像、密集深度图像及对应像素级标注（涵盖不同形状与尺寸的自由空间及道路缺陷）的大规模道路场景解析数据集。在SYN-UDTIRI数据集及KITTI Road、CityScapes与ORFD三个公开数据集上的广泛实验评估表明，RoadFormer在道路场景解析任务中优于所有其他最优网络。特别地，RoadFormer在KITTI Road基准测试中位列第一。我们的源代码、数据集及演示视频已公开于mias.group/RoadFormer。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日