The Dog Walking Theory: Rethinking Convergence in Federated Learning

Federated learning (FL) is a collaborative learning paradigm that allows different clients to train one powerful global model without sharing their private data. Although FL has demonstrated promising results in various applications, it is known to suffer from convergence issues caused by the data distribution shift across different clients, especially on non-independent and identically distributed (non-IID) data. In this paper, we study the convergence of FL on non-IID data and propose a novel \emph{Dog Walking Theory} to formulate and identify the missing element in existing research. The Dog Walking Theory describes the process of a dog walker leash walking multiple dogs from one side of the park to the other. The goal of the dog walker is to arrive at the right destination while giving the dogs enough exercise (i.e., space exploration). In FL, the server is analogous to the dog walker while the clients are analogous to the dogs. This analogy allows us to identify one crucial yet missing element in existing FL algorithms: the leash that guides the exploration of the clients. To address this gap, we propose a novel FL algorithm \emph{FedWalk} that leverages an external easy-to-converge task at the server side as a \emph{leash task} to guide the local training of the clients. We theoretically analyze the convergence of FedWalk with respect to data heterogeneity (between server and clients) and task discrepancy (between the leash and the original tasks). Experiments on multiple benchmark datasets demonstrate the superiority of FedWalk over state-of-the-art FL methods under both IID and non-IID settings.

翻译：联邦学习（FL）是一种协作学习范式，允许不同客户端在不共享私有数据的情况下训练一个强大的全局模型。尽管FL已在多种应用中展现出令人瞩目的成果，但已知其存在由不同客户端间数据分布偏移导致的收敛问题，尤其在非独立同分布（non-IID）数据上表现突出。本文研究了FL在non-IID数据上的收敛问题，提出了一种新颖的《遛狗理论》来系统化阐述并识别现有研究中缺失的关键要素。遛狗理论描述了遛狗者用牵引绳带领多只狗从公园一侧走到另一侧的过程。遛狗者的目标是抵达正确目的地，同时让狗获得充分锻炼（即空间探索）。在FL中，服务器类比为遛狗者，客户端则类比为狗。这一类比使我们识别出现有FL算法中一个关键但缺失的要素：引导客户端探索的牵引绳。为填补这一空白，我们提出了一种新颖的FL算法《FedWalk》，该算法利用服务器端易于收敛的外部任务作为《牵引绳任务》来指导客户端的本地训练。我们从理论上分析了FedWalk在数据异质性（服务器与客户端之间）和任务差异（牵引绳任务与原始任务之间）方面的收敛性。在多个基准数据集上的实验表明，FedWalk在IID和non-IID设置下均优于现有最先进的FL方法。