Machine learning typically relies on the assumption that training and testing distributions are identical and that data is centrally stored for training and testing. However, in real-world scenarios, distributions may differ significantly and data is often distributed across different devices, organizations, or edge nodes. Consequently, it is imperative to develop models that can effectively generalize to unseen distributions where data is distributed across different domains. In response to this challenge, there has been a surge of interest in federated domain generalization (FDG) in recent years. FDG combines the strengths of federated learning (FL) and domain generalization (DG) techniques to enable multiple source domains to collaboratively learn a model capable of directly generalizing to unseen domains while preserving data privacy. However, generalizing the federated model under domain shifts is a technically challenging problem that has received scant attention in the research area so far. This paper presents the first survey of recent advances in this area. Initially, we discuss the development process from traditional machine learning to domain adaptation and domain generalization, leading to FDG as well as provide the corresponding formal definition. Then, we categorize recent methodologies into four classes: federated domain alignment, data manipulation, learning strategies, and aggregation optimization, and present suitable algorithms in detail for each category. Next, we introduce commonly used datasets, applications, evaluations, and benchmarks. Finally, we conclude this survey by providing some potential research topics for the future.
翻译:机器学习通常依赖于训练和测试分布相同且数据集中存储用于训练和测试的假设。然而,在实际场景中,分布可能存在显著差异,且数据通常分布在不同的设备、组织或边缘节点上。因此,亟需开发能够有效泛化到未见分布(数据分布在不同的域中)的模型。为应对这一挑战,近年来联邦域泛化(FDG)引起了广泛关注。FDG结合了联邦学习(FL)和域泛化(DG)技术的优势,使多个源域能够协作学习一个可直接泛化到未见域且保护数据隐私的模型。然而,在域偏移下泛化联邦模型是一个技术上具有挑战性的问题,迄今在该研究领域尚未得到充分关注。本文首次对该领域的最新进展进行了综述。首先,我们讨论了从传统机器学习到域适应和域泛化,再到FDG的发展过程,并提供了相应的形式化定义。接着,我们将近期方法分为四类:联邦域对齐、数据操作、学习策略和聚合优化,并详细介绍了每类中的适用算法。然后,我们介绍了常用的数据集、应用、评估方法和基准。最后,我们通过提出一些未来潜在的研究方向来总结本综述。