Tuning hyperparameters in federated machine learning can substantially impact model performance. When hyperparameters are tuned on sensitive data, privacy becomes an important challenge and to this end, differential privacy has emerged as the de facto standard for provable privacy. A standard setting in federated learning is that clients agree on a shared setup, i.e., find a compromise from a set of hyperparameters, like a model's learning rate. Yet, prior work on privacy-preserving hyperparameter tuning is tailored to specific learning tasks, does not account for the privacy leakage of aggregated results, or offers a sub-optimal privacy-utility trade-off. In this work, we present our algorithm DP-Hype, which performs a federated and privacy-preserving hyperparameter search by conducting a federated voting based on local hyperparameter evaluations of clients. In this way, DP-Hype selects hyperparameters that lead to a compromise supported by a majority of clients, while maintaining scalability and independence from specific learning tasks. We prove that DP-Hype preserves the strong notion of differential privacy called client-level differential privacy and, importantly, show that its privacy guarantees do not depend on the number of hyperparameters. We also provide bounds on its utility guarantees, that is, the probability of finding good hyperparameters, and implement DP-Hype as a submodule in the popular Flower framework for federated machine learning. In addition, we evaluate performance on multiple benchmark data sets in iid as well as multiple non-iid settings and demonstrate high utility of DP-Hype even under small privacy budgets.
翻译:在联邦机器学习中调整超参数可显著影响模型性能。当超参数在敏感数据上调整时,隐私保护成为重要挑战,为此差分隐私已成为可证明隐私保护的事实标准。联邦学习中的典型设定是客户端就共享配置达成共识,即从一组超参数(如模型的学习率)中寻找折衷方案。然而,现有隐私保护超参数调整工作要么针对特定学习任务量身定制,要么未考虑聚合结果的隐私泄露问题,要么提供次优的隐私-效用权衡。本文提出DP-Hype算法,该算法通过基于客户端本地超参数评估的联邦投票机制,实现联邦且隐私保护的超参数搜索。通过这种方式,DP-Hype在保持可扩展性和任务独立性的同时,选择获得多数客户端支持的折衷超参数。我们证明DP-Hype能保持强差分隐私概念——客户端级差分隐私,并重要的是证明其隐私保证不依赖于超参数数量。我们还给出了其效用保证的边界(即发现优质超参数的概率),并在联邦机器学习主流框架Flower中实现了DP-Hype子模块。此外,我们在多个独立同分布及非独立同分布基准数据集上评估性能,证明即使在较小隐私预算下DP-Hype仍具有高效用。