FeDXL: Provable Federated Learning for Deep X-Risk Optimization

In this paper, we tackle a novel federated learning (FL) problem for optimizing a family of X-risks, to which no existing FL algorithms are applicable. In particular, the objective has the form of $\mathbb E_{z\sim S_1} f(\mathbb E_{z'\sim S_2} \ell(w; z, z'))$, where two sets of data $S_1, S_2$ are distributed over multiple machines, $\ell(\cdot)$ is a pairwise loss that only depends on the prediction outputs of the input data pairs $(z, z')$, and $f(\cdot)$ is possibly a non-linear non-convex function. This problem has important applications in machine learning, e.g., AUROC maximization with a pairwise loss, and partial AUROC maximization with a compositional loss. The challenges for designing an FL algorithm for X-risks lie in the non-decomposability of the objective over multiple machines and the interdependency between different machines. To this end, we propose an active-passive decomposition framework that decouples the gradient's components with two types, namely active parts and passive parts, where the active parts depend on local data that are computed with the local model and the passive parts depend on other machines that are communicated/computed based on historical models and samples. Under this framework, we develop two provable FL algorithms (FeDXL) for handling linear and nonlinear $f$, respectively, based on federated averaging and merging. We develop a novel theoretical analysis to combat the latency of the passive parts and the interdependency between the local model parameters and the involved data for computing local gradient estimators. We establish both iteration and communication complexities and show that using the historical samples and models for computing the passive parts do not degrade the complexities. We conduct empirical studies of FeDXL for deep AUROC and partial AUROC maximization, and demonstrate their performance compared with several baselines.

翻译：本文针对一类新型联邦学习（FL）问题——优化X-风险族——展开研究，现有FL算法均无法处理此类问题。具体而言，目标函数具有形式$\mathbb E_{z\sim S_1} f(\mathbb E_{z'\sim S_2} \ell(w; z, z'))$，其中两组数据集$S_1, S_2$分布于多个机器上，$\ell(\cdot)$为仅依赖输入数据对$(z, z')$预测输出的成对损失函数，$f(\cdot)$可能是非线性非凸函数。该问题在机器学习领域具有重要应用，例如基于成对损失的AUROC最大化，以及基于组合损失的部分AUROC最大化。设计面向X-风险的FL算法所面临的挑战在于：目标函数在多个机器上的不可分解性，以及不同机器间的相互依赖性。为此，我们提出主动-被动分解框架，将梯度分量解耦为两类：主动部分（依赖本地模型计算的本地数据）和被动部分（依赖其他机器基于历史模型与样本进行通信/计算）。在该框架下，我们分别针对线性与非线性$f$函数，基于联邦平均与合并技术开发了两种可证明的FL算法（FeDXL）。我们提出了新颖的理论分析，以应对被动部分的延迟效应以及本地模型参数与用于计算本地梯度估计量的数据之间的相互依赖性。我们同时建立了迭代复杂度与通信复杂度，并证明使用历史样本与模型计算被动部分不会降低复杂度。我们针对深度AUROC与部分AUROC最大化任务开展了FeDXL的实证研究，并与多种基线方法进行了性能对比。