Street Scene Semantic Understanding (denoted as TriSU) is a crucial but complex task for world-wide distributed autonomous driving (AD) vehicles (e.g., Tesla). Its inference model faces poor generalization issue due to inter-city domain-shift. Hierarchical Federated Learning (HFL) offers a potential solution for improving TriSU model generalization, but suffers from slow convergence rate because of vehicles' surrounding heterogeneity across cities. Going beyond existing HFL works that have deficient capabilities in complex tasks, we propose a rapid-converged heterogeneous HFL framework (FedRC) to address the inter-city data heterogeneity and accelerate HFL model convergence rate. In our proposed FedRC framework, both single RGB image and RGB dataset are modelled as Gaussian distributions in HFL aggregation weight design. This approach not only differentiates each RGB sample instead of typically equalizing them, but also considers both data volume and statistical properties rather than simply taking data quantity into consideration. Extensive experiments on the TriSU task using across-city datasets demonstrate that FedRC converges faster than the state-of-the-art benchmark by 38.7%, 37.5%, 35.5%, and 40.6% in terms of mIoU, mPrecision, mRecall, and mF1, respectively. Furthermore, qualitative evaluations in the CARLA simulation environment confirm that the proposed FedRC framework delivers top-tier performance.
翻译:街景语义理解(简称为TriSU)是全球分布式自动驾驶车辆(如特斯拉)的关键但复杂的任务。由于城市间的领域偏移,其推理模型面临泛化能力差的问题。分层联邦学习为提升TriSU模型泛化能力提供了潜在解决方案,但受限于车辆跨城市的环境异质性,其收敛速度缓慢。针对现有HFL方法在复杂任务中能力不足的问题,本文提出一种快速收敛的异构HFL框架(FedRC),以应对跨城市数据异质性并加速HFL模型收敛。在FedRC框架中,单张RGB图像与RGB数据集在HFL聚合权重设计中均被建模为高斯分布。该方法不仅区分了每个RGB样本(而非传统均等处理),同时兼顾数据量与统计特性(而非仅考虑数据规模)。基于跨城市数据集在TriSU任务上的大量实验表明,FedRC在mIoU、mPrecision、mRecall和mF1指标上分别以38.7%、37.5%、35.5%和40.6%的幅度快于现有最优基准方法实现收敛。此外,在CARLA仿真环境中的定性评估证实,所提出的FedRC框架具备顶尖性能表现。