Reconstruction attacks on machine learning (ML) models pose a strong risk of leakage of sensitive data. In specific contexts, an adversary can (almost) perfectly reconstruct training data samples from a trained model using the model's gradients. When training ML models with differential privacy (DP), formal upper bounds on the success of such reconstruction attacks can be provided. So far, these bounds have been formulated under worst-case assumptions that might not hold high realistic practicality. In this work, we provide formal upper bounds on reconstruction success under realistic adversarial settings against ML models trained with DP and support these bounds with empirical results. With this, we show that in realistic scenarios, (a) the expected reconstruction success can be bounded appropriately in different contexts and by different metrics, which (b) allows for a more educated choice of a privacy parameter.
翻译:机器学习模型的重构攻击对敏感数据的泄露构成严重风险。在某些场景下,对抗者可以利用模型梯度从已训练模型中(近乎)完美地重构出训练数据样本。当使用差分隐私训练机器学习模型时,可以提供此类重构攻击成功概率的形式化上界。目前这些上界均在可能缺乏高度现实实用性的最坏情况假设下制定。本研究针对使用差分隐私训练的机器学习模型,在符合实际对抗场景中提供了重构成功概率的形式化上界,并通过实证结果支撑这些界限。由此证明,在现实场景中:(a)预期重构成功概率可根据不同场景和不同度量标准得到适当约束,(b)这有助于更明智地选择隐私参数。