Sparse basis recovery is a classical and important statistical learning problem when the number of model dimensions $p$ is much larger than the number of samples $n$. However, there has been little work that studies sparse basis recovery in the Federated Learning (FL) setting, where the client data's differential privacy (DP) must also be simultaneously protected. In particular, the performance guarantees of existing DP-FL algorithms (such as DP-SGD) will degrade significantly when $p \gg n$, and thus, they will fail to learn the true underlying sparse model accurately. In this work, we develop a new differentially private sparse basis recovery algorithm for the FL setting, called SPriFed-OMP. SPriFed-OMP converts OMP (Orthogonal Matching Pursuit) to the FL setting. Further, it combines SMPC (secure multi-party computation) and DP to ensure that only a small amount of noise needs to be added in order to achieve differential privacy. As a result, SPriFed-OMP can efficiently recover the true sparse basis for a linear model with only $n = O(\sqrt{p})$ samples. We further present an enhanced version of our approach, SPriFed-OMP-GRAD based on gradient privatization, that improves the performance of SPriFed-OMP. Our theoretical analysis and empirical results demonstrate that both SPriFed-OMP and SPriFed-OMP-GRAD terminate in a small number of steps, and they significantly outperform the previous state-of-the-art DP-FL solutions in terms of the accuracy-privacy trade-off.
翻译:[translated abstract in Chinese] 稀疏基恢复是一个经典且重要的统计学习问题,出现在模型维度$p$远大于样本数量$n$时。然而,目前鲜有工作在联邦学习(FL)场景下研究稀疏基恢复,同时还需保护客户端数据的差分隐私(DP)。特别是,当$p \gg n$时,现有DP-FL算法(如DP-SGD)的性能保证会显著下降,从而无法准确学习真实的底层稀疏模型。在本工作中,我们为FL场景提出了一种新的差分隐私稀疏基恢复算法,名为SPriFed-OMP。SPriFed-OMP将正交匹配追踪(OMP)转化为FL场景,并进一步结合安全多方计算(SMPC)与DP,确保仅需添加少量噪声即可实现差分隐私。因此,SPriFed-OMP能仅用$n = O(\sqrt{p})$个样本高效恢复线性模型的真实稀疏基。我们还提出了增强版本SPriFed-OMP-GRAD,其基于梯度私有化,提升了SPriFed-OMP的性能。理论分析与实证结果表明,SPriFed-OMP和SPriFed-OMP-GRAD均在少量步骤内终止,并在精度-隐私权衡上显著优于先前的先进DP-FL解决方案。