The ubiquity of distributed machine learning (ML) in sensitive public domain applications calls for algorithms that protect data privacy, while being robust to faults and adversarial behaviors. Although privacy and robustness have been extensively studied independently in distributed ML, their synthesis remains poorly understood. We present the first tight analysis of the error incurred by any algorithm ensuring robustness against a fraction of adversarial machines, as well as differential privacy (DP) for honest machines' data against any other curious entity. Our analysis exhibits a fundamental trade-off between privacy, robustness, and utility. To prove our lower bound, we consider the case of mean estimation, subject to distributed DP and robustness constraints, and devise reductions to centralized estimation of one-way marginals. We prove our matching upper bound by presenting a new distributed ML algorithm using a high-dimensional robust aggregation rule. The latter amortizes the dependence on the dimension in the error (caused by adversarial workers and DP), while being agnostic to the statistical properties of the data.
翻译:分布式机器学习在敏感公共领域应用中的普及要求算法既能保护数据隐私,又能抵御故障和对抗行为。尽管隐私与鲁棒性在分布式机器学习中已分别得到广泛研究,但二者的结合机制仍未被充分理解。我们首次严格分析了任何能抵御部分恶意机器并保障诚实机器数据对任意窥探实体满足差分隐私的算法所引发的误差。该分析揭示了隐私、鲁棒性与效用之间的根本性权衡。为证明下界,我们考虑受分布式差分隐私与鲁棒性约束的均值估计问题,并将其归约为高维单边边际分布的集中式估计。我们通过提出一种采用高维鲁棒聚合规则的新型分布式机器学习算法,证明了相应的匹配上界。该算法能分摊误差(由恶意工作节点与差分隐私引起)中对维度的依赖,同时保持对数据统计特性的agnostic性。