FeDXL: Provable Federated Learning for Deep X-Risk Optimization

In this paper, we tackle a novel federated learning (FL) problem for optimizing a family of X-risks, to which no existing FL algorithms are applicable. In particular, the objective has the form of $\mathbb E_{z\sim S_1} f(\mathbb E_{z'\sim S_2} \ell(w; z, z'))$, where two sets of data $S_1, S_2$ are distributed over multiple machines, $\ell(\cdot)$ is a pairwise loss that only depends on the prediction outputs of the input data pairs $(z, z')$, and $f(\cdot)$ is possibly a non-linear non-convex function. This problem has important applications in machine learning, e.g., AUROC maximization with a pairwise loss, and partial AUROC maximization with a compositional loss. The challenges for designing an FL algorithm for X-risks lie in the non-decomposability of the objective over multiple machines and the interdependency between different machines. To this end, we propose an active-passive decomposition framework that decouples the gradient's components with two types, namely active parts and passive parts, where the active parts depend on local data that are computed with the local model and the passive parts depend on other machines that are communicated/computed based on historical models and samples. Under this framework, we develop two provable FL algorithms (FeDXL) for handling linear and nonlinear $f$, respectively, based on federated averaging and merging. We develop a novel theoretical analysis to combat the latency of the passive parts and the interdependency between the local model parameters and the involved data for computing local gradient estimators. We establish both iteration and communication complexities and show that using the historical samples and models for computing the passive parts do not degrade the complexities. We conduct empirical studies of FeDXL for deep AUROC and partial AUROC maximization, and demonstrate their performance compared with several baselines.

翻译：摘要：本文针对一类新颖的联邦学习（FL）问题——优化X-风险族，现有FL算法均无法适用于此类问题。具体而言，目标函数形式为$\mathbb E_{z\sim S_1} f(\mathbb E_{z'\sim S_2} \ell(w; z, z'))$，其中两组数据集$S_1, S_2$分布于多个机器上，$\ell(\cdot)$为仅依赖于输入数据对$(z, z')$预测输出的成对损失函数，$f(\cdot)$可能是非线性非凸函数。该问题在机器学习中具有重要应用，例如基于成对损失的AUROC最大化以及基于组合损失的部分AUROC最大化。设计X-风险的FL算法面临的挑战在于：目标函数在多个机器上不可分解，以及不同机器间的相互依赖性。为此，我们提出一种主动-被动分解框架，将梯度分量解耦为两类：主动部分（依赖通过本地模型计算的本地数据）与被动部分（依赖通过历史模型和样本通信/计算的其他机器数据）。在此框架下，我们基于联邦平均与融合策略，分别针对线性与非线性$f$开发了两种可证明的FL算法（FeDXL）。我们提出新颖的理论分析以应对被动部分的延迟问题，以及本地模型参数与用于计算本地梯度估计器的数据之间的相互依赖关系。我们建立了迭代复杂度与通信复杂度，并证明使用历史样本和模型计算被动部分不会导致复杂度退化。通过深度AUROC与部分AUROC最大化的实证研究，我们展示了FeDXL相较于多种基准方法的性能。