Differential privacy (DP) is by far the most widely accepted framework for mitigating privacy risks in machine learning. However, exactly how small the privacy parameter $\epsilon$ needs to be to protect against certain privacy risks in practice is still not well-understood. In this work, we study data reconstruction attacks for discrete data and analyze it under the framework of multiple hypothesis testing. We utilize different variants of the celebrated Fano's inequality to derive upper bounds on the inferential power of a data reconstruction adversary when the model is trained differentially privately. Importantly, we show that if the underlying private data takes values from a set of size $M$, then the target privacy parameter $\epsilon$ can be $O(\log M)$ before the adversary gains significant inferential power. Our analysis offers theoretical evidence for the empirical effectiveness of DP against data reconstruction attacks even at relatively large values of $\epsilon$.
翻译:差分隐私(DP)是目前公认的用于缓解机器学习中隐私风险的最广泛框架。然而,在实践中,隐私参数ε具体需要多小才能防止某些隐私风险仍未被充分理解。在本工作中,我们研究了离散数据的数据重构攻击,并在多重假设检验框架下对其进行分析。我们利用著名的Fano不等式的不同变体,推导出在模型经过差分隐私训练时,数据重构对手推理能力上界。重要的是,我们证明:如果底层私有数据取值于大小为M的集合,那么对手获得显著推理能力之前,目标隐私参数ε可以达到O(log M)。我们的分析为差分隐私即使在中等的ε值下也能有效抵御数据重构攻击提供了理论证据。