Self-supervised Multi-task Learning Framework for Safety and Health-Oriented Connected Driving Environment Perception using Onboard Camera

Cutting-edge connected vehicle (CV) technologies have drawn much attention in recent years. The real-time traffic data captured by a CV can be shared with other CVs and data centers so as to open new possibilities for solving diverse transportation problems. However, imagery captured by onboard cameras in a connected environment, are not sufficiently investigated, especially for safety and health-oriented visual perception. In this paper, a bidirectional process of image synthesis and decomposition (BPISD) approach is proposed, and thus a novel self-supervised multi-task learning framework, to simultaneously estimate depth map, atmospheric visibility, airlight, and PM2.5 mass concentration, in which depth map and visibility are considered highly associated with traffic safety, while airlight and PM2.5 mass concentration are directly correlated with human health. Both the training and testing phases of the proposed system solely require a single image as input. Due to the innovative training pipeline, the depth estimation network can manage various levels of visibility conditions and overcome inherent problems in current image-synthesis-based depth estimation, thereby generating high-quality depth maps even in low-visibility situations and further benefiting accurate estimations of visibility, airlight, and PM2.5 mass concentration. Extensive experiments on the synthesized data from the KITTI and real-world data collected in Beijing demonstrate that the proposed method can (1) achieve performance competitive in depth estimation as compared with state-of-the-art methods when taking clear images as input; (2) predict vivid depth map for images contaminated by various levels of haze; and (3) accurately estimate visibility, airlight, and PM2.5 mass concentrations. Beneficial applications can be developed based on the presented work to improve traffic safety, air quality, and public health.

翻译：尖端网联车辆（CV）技术近年来备受关注。网联车辆捕获的实时交通数据可与其他网联车辆及数据中心共享，为解决多元交通问题开辟了新可能。然而，网联环境下车载摄像头采集的影像尚未得到充分研究，特别是面向安全与健康导向的视觉感知。本文提出一种图像合成与分解的双向过程方法（BPISD），并据此构建了一种新型自监督多任务学习框架，可同步估计深度图、大气能见度、大气光与PM2.5质量浓度。其中，深度图与能见度被认为与交通安全高度相关，而大气光与PM2.5质量浓度则直接影响人体健康。所提系统的训练与测试阶段仅需单张图像作为输入。得益于创新的训练流程，深度估计网络能够处理不同能见度条件，克服当前基于图像合成的深度估计存在的固有问题，从而在低能见度场景下生成高质量的深度图，并进一步提升能见度、大气光与PM2.5质量浓度的估算精度。基于KITTI合成数据与北京实地采集数据的广泛实验表明：所提方法（1）在输入清晰图像时，深度估计性能可媲美现有最优方法；（2）能对受不同雾霾程度污染的图像预测清晰的深度图；（3）可精确估算能见度、大气光与PM2.5质量浓度。基于本研究可开发多项有益应用，以改善交通安全、空气质量与公共健康。