The usage of federated learning (FL) in Vehicular Ad hoc Networks (VANET) has garnered significant interest in research due to the advantages of reducing transmission overhead and protecting user privacy by communicating local dataset gradients instead of raw data. However, implementing FL in VANETs faces challenges, including limited communication resources, high vehicle mobility, and the statistical diversity of data distributions. In order to tackle these issues, this paper introduces a novel framework for hierarchical federated learning (HFL) over multi-hop clustering-based VANET. The proposed method utilizes a weighted combination of the average relative speed and cosine similarity of FL model parameters as a clustering metric to consider both data diversity and high vehicle mobility. This metric ensures convergence with minimum changes in cluster heads while tackling the complexities associated with non-independent and identically distributed (non-IID) data scenarios. Additionally, the framework includes a novel mechanism to manage seamless transitions of cluster heads (CHs), followed by transferring the most recent FL model parameter to the designated CH. Furthermore, the proposed approach considers the option of merging CHs, aiming to reduce their count and, consequently, mitigate associated overhead. Through extensive simulations, the proposed hierarchical federated learning over clustered VANET has been demonstrated to improve accuracy and convergence time significantly while maintaining an acceptable level of packet overhead compared to previously proposed clustering algorithms and non-clustered VANET.
翻译:联邦学习(FL)在车辆自组织网络(VANET)中的应用因具有通过传输局部数据集梯度而非原始数据来降低传输开销和保护用户隐私的优势,已引起研究界的广泛关注。然而,在VANET中实现FL面临诸多挑战,包括有限的通信资源、车辆的高机动性以及数据分布的统计差异性。为解决这些问题,本文提出了一种新颖的基于多跳分簇VANET的分层联邦学习(HFL)框架。该方法采用FL模型参数的平均相对速度与余弦相似度的加权组合作为分簇度量,以兼顾数据多样性和车辆高机动性。该度量在应对非独立同分布(non-IID)数据场景复杂性的同时,确保簇头变化最小时实现收敛。此外,该框架包含一种管理簇头(CH)无缝切换的新机制,随后将最新的FL模型参数传输至指定簇头。进一步地,所提方法考虑了簇头合并选项,旨在减少簇头数量,从而降低相关开销。通过大量仿真,与先前提出的分簇算法及未分簇VANET相比,所提出的基于分簇VANET的分层联邦学习在保持可接受数据包开销水平的同时,显著提升了准确率和收敛时间。