Machine learning systems can produce personalized outputs that allow an adversary to infer sensitive input attributes at inference time. We introduce Robust Privacy (RP), an inference-time privacy notion inspired by certified robustness: if a model's prediction is provably invariant within a radius-$R$ neighborhood around an input $x$ (e.g., under the $\ell_2$ norm), then $x$ enjoys $R$-Robust Privacy, i.e., observing the prediction cannot distinguish $x$ from any input within distance $R$ of $x$. We further develop Attribute Privacy Enhancement (APE) to translate input-level invariance into an attribute-level privacy effect. In a controlled recommendation task where the decision depends primarily on a sensitive attribute, we show that RP expands the set of sensitive-attribute values compatible with a positive recommendation, expanding the inference interval accordingly. Finally, we empirically demonstrate that RP also mitigates model inversion attacks (MIAs) by masking fine-grained input-output dependence. Even at small noise levels ($σ=0.1$), RP reduces the attack success rate (ASR) from 73% to 4% with partial model performance degradation. RP can also partially mitigate MIAs (e.g., ASR drops to 44%) with no model performance degradation.
翻译:机器学习系统可能产生个性化的输出,使得攻击者能够在推理时推断出敏感的输入属性。我们提出了鲁棒隐私(Robust Privacy, RP),这是一种受认证鲁棒性启发的推理时隐私概念:如果一个模型的预测在输入 $x$ 的半径为 $R$ 的邻域内(例如在 $\ell_2$ 范数下)被证明是不变的,那么 $x$ 就享有 $R$-鲁棒隐私,即观察预测结果无法将 $x$ 与距离 $x$ 在 $R$ 范围内的任何输入区分开来。我们进一步开发了属性隐私增强(Attribute Privacy Enhancement, APE),将输入级的不变性转化为属性级的隐私效应。在一个决策主要依赖于敏感属性的受控推荐任务中,我们表明 RP 扩大了与正向推荐兼容的敏感属性值集合,从而相应地扩大了推断区间。最后,我们通过实验证明,RP 还能通过掩盖细粒度的输入-输出依赖关系来缓解模型反演攻击(Model Inversion Attacks, MIAs)。即使在较小的噪声水平($σ=0.1$)下,RP 也能将攻击成功率(Attack Success Rate, ASR)从 73% 降低到 4%,同时伴随着部分模型性能的下降。RP 还可以在不降低模型性能的情况下部分缓解 MIAs(例如,ASR 降至 44%)。