It is challenging to select the right privacy-preserving mechanism for federated query processing over multiple private data silos. There exist numerous privacy-preserving mechanisms, such as secure multi-party computing (SMC), approximate query processing with differential privacy (DP), combined SMC and DP, DP-based data obfuscation, and federated learning. These mechanisms make different trade-offs among accuracy, privacy, execution efficiency, and storage efficiency. In this work, we first introduce a new privacy-preserving technique that uses a deep learning model trained using the Differentially-Private Stochastic Gradient Descent (DP-SGD) algorithm to replace portions of actual data to answer a query. We then demonstrate a novel declarative privacy-preserving workflow that allows users to specify "what private information to protect" rather than "how to protect". Under the hood, the system relies on a cost model to automatically choose privacy-preserving mechanisms as well as hyper-parameters. At the same time, the proposed workflow also allows human experts to review and tune the selected privacy-preserving mechanism for audit/compliance, and optimization purposes.
翻译:在多私有数据孤岛上进行联邦查询处理时,选择合适的隐私保护机制具有挑战性。目前存在众多隐私保护机制,例如安全多方计算(SMC)、基于差分隐私(DP)的近似查询处理、SMC与DP的结合、基于DP的数据混淆以及联邦学习。这些机制在准确性、隐私性、执行效率和存储效率之间做出不同的权衡。在本工作中,我们首先提出一种新的隐私保护技术,该技术使用通过差分隐私随机梯度下降(DP-SGD)算法训练的深度学习模型来替代部分真实数据以响应查询。随后,我们展示了一种新颖的声明式隐私保护工作流,允许用户指定“保护何种隐私信息”而非“如何保护”。系统内部依赖成本模型自动选择隐私保护机制及超参数。同时,所提出的工作流也允许人类专家出于审计/合规及优化目的,对所选隐私保护机制进行审查与调优。