Modern machine learning (ML) models have grown to a scale where training them on a single machine becomes impractical. As a result, there is a growing trend to leverage federated learning (FL) techniques to train large ML models in a distributed and collaborative manner. These models, however, when deployed on new devices, might struggle to generalize well due to domain shifts. In this context, federated domain adaptation (FDA) emerges as a powerful approach to address this challenge. Most existing FDA approaches typically focus on aligning the distributions between source and target domains by minimizing their (e.g., MMD) distance. Such strategies, however, inevitably introduce high communication overheads and can be highly sensitive to network reliability. In this paper, we introduce RF-TCA, an enhancement to the standard Transfer Component Analysis approach that significantly accelerates computation without compromising theoretical and empirical performance. Leveraging the computational advantage of RF-TCA, we further extend it to FDA setting with FedRF-TCA. The proposed FedRF-TCA protocol boasts communication complexity that is independent of the sample size, while maintaining performance that is either comparable to or even surpasses state-of-the-art FDA methods. We present extensive experiments to showcase the superior performance and robustness (to network condition) of FedRF-TCA.
翻译:现代机器学习模型已发展到在单台机器上训练变得不切实际的规模。因此,利用联邦学习技术以分布式协作方式训练大型机器学习模型的趋势日益增长。然而,当这些模型部署在新设备上时,可能因域偏移而难以良好泛化。在此背景下,联邦域自适应成为一种应对此挑战的强大方法。现有的大多数联邦域自适应方法通常侧重于通过最小化源域与目标域之间的分布距离来对齐分布。然而,此类策略不可避免地会引入高通信开销,并且可能对网络可靠性高度敏感。本文提出RF-TCA,作为标准迁移成分分析方法的增强版本,能在不损害理论与实证性能的前提下显著加速计算。借助RF-TCA的计算优势,我们进一步将其扩展至联邦域自适应场景,提出FedRF-TCA协议。该协议具有与样本数量无关的通信复杂度,同时保持与最先进联邦域自适应方法相当甚至更优的性能。我们通过大量实验展示了FedRF-TCA的卓越性能及对网络条件的鲁棒性。