RoadFormer: Duplex Transformer for RGB-Normal Semantic Road Scene Parsing

The recent advancements in deep convolutional neural networks have shown significant promise in the domain of road scene parsing. Nevertheless, the existing works focus primarily on freespace detection, with little attention given to hazardous road defects that could compromise both driving safety and comfort. In this paper, we introduce RoadFormer, a novel Transformer-based data-fusion network developed for road scene parsing. RoadFormer utilizes a duplex encoder architecture to extract heterogeneous features from both RGB images and surface normal information. The encoded features are subsequently fed into a novel heterogeneous feature synergy block for effective feature fusion and recalibration. The pixel decoder then learns multi-scale long-range dependencies from the fused and recalibrated heterogeneous features, which are subsequently processed by a Transformer decoder to produce the final semantic prediction. Additionally, we release SYN-UDTIRI, the first large-scale road scene parsing dataset that contains over 10,407 RGB images, dense depth images, and the corresponding pixel-level annotations for both freespace and road defects of different shapes and sizes. Extensive experimental evaluations conducted on our SYN-UDTIRI dataset, as well as on three public datasets, including KITTI road, CityScapes, and ORFD, demonstrate that RoadFormer outperforms all other state-of-the-art networks for road scene parsing. Specifically, RoadFormer ranks first on the KITTI road benchmark. Our source code, created dataset, and demo video are publicly available at mias.group/RoadFormer.

翻译：深度卷积神经网络的最新进展在道路场景解析领域展现出显著潜力。然而，现有研究主要集中于可行驶区域检测，对可能危及驾驶安全性与舒适性的危险道路缺陷关注甚少。本文提出RoadFormer——一种基于Transformer的新型数据融合网络，专为道路场景解析而设计。RoadFormer采用双工编码器架构，从RGB图像和表面法线信息中提取异构特征。编码后的特征随后输入新型异构特征协同模块，以实现有效的特征融合与重校准。像素解码器从融合重校准后的异构特征中学习多尺度长程依赖关系，再通过Transformer解码器处理以生成最终语义预测。此外，我们发布了首个大规模道路场景解析数据集SYN-UDTIRI，包含10,407余幅RGB图像、稠密深度图像，以及针对不同形状尺寸的可行驶区域与道路缺陷的像素级标注。在SYN-UDTIRI数据集及KITTI road、CityScapes、ORFD三个公开数据集上的大量实验评估表明，RoadFormer在道路场景解析任务上优于所有现有先进网络。特别地，RoadFormer在KITTI road基准测试中位列榜首。我们的源代码、创建的数据集及演示视频已公开于mias.group/RoadFormer。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日