In this paper, we study the setting in which data owners train machine learning models collaboratively under a privacy notion called joint differential privacy [Kearns et al., 2018]. In this setting, the model trained for each data owner $j$ uses $j$'s data without privacy consideration and other owners' data with differential privacy guarantees. This setting was initiated in [Jain et al., 2021] with a focus on linear regressions. In this paper, we study this setting for stochastic convex optimization (SCO). We present an algorithm that is a variant of DP-SGD [Song et al., 2013; Abadi et al., 2016] and provides theoretical bounds on its population loss. We compare our algorithm to several baselines and discuss for what parameter setups our algorithm is more preferred. We also empirically study joint differential privacy in the multi-class classification problem over two public datasets. Our empirical findings are well-connected to the insights from our theoretical results.
翻译:在本文中,我们研究了数据所有者在称为联合差分隐私[Kearns等人,2018]的隐私概念下协作训练机器学习模型的场景。在此场景下,为每个数据所有者$j$训练的模型使用$j$的数据时无需考虑隐私保护,而使用其他所有者的数据时需满足差分隐私保证。该场景最初由[Jain等人,2021]针对线性回归问题提出。本文针对随机凸优化(SCO)问题研究该场景。我们提出一种DP-SGD[Song等人,2013;Abadi等人,2016]变体算法,并给出其总体损失的理论界。我们将所提算法与多个基线方法进行比较,并讨论在何种参数设置下该算法更具优势。此外,我们通过两个公开数据集上的多类分类问题对联合差分隐私进行实证研究。实验结果与理论洞察高度吻合。