Robust Privacy: Inference-Stage Privacy through Certified Robustness

An adversary observing a model's released prediction can infer sensitive attributes of the queried input, or even reconstruct representatives of the model's training data. The inference interface thus acts as a side channel for privacy leakage. We introduce Robust Privacy (RP), an inference-stage privacy notion inspired by certified robustness: if a model's prediction is provably invariant within a radius-R neighborhood around an input x with confidence at least $1-α$, then x enjoys $(R,α)$-Robust Privacy, under which we prove that any adversary observing the released prediction has at most $α/2$ advantage in distinguishing x from any input within distance R of x. Building on RP, we formalize Robust Attribute Privacy (RAP), an attribute-level privacy notion that characterizes the set of sensitive-attribute values that remain compatible with a released prediction. On a classification task, RP increases the median length of the RAP-compatible inference interval from 23.50 to 29.96, reducing attribute-inference precision. Model inversion attacks, often treated as a training-stage threat, in fact rely on fine-grained signals leaked through the inference interface; RP masks these signals at the inference stage, reducing attack success rate (ASR) from 73% to 4% on a black-box inversion attack. This direct targeting of the leakage channel enables RP to dominate DP-SGD and randomized response in the privacy-utility tradeoff space: RP retains 98.4% accuracy at 21% ASR, whereas DP-SGD must drop accuracy to 61.7% to reach a comparable ASR. Across both experiments, increasing the smoothing sample size N strengthens privacy and improves utility together. Finally, we examine model distillation as a scope boundary and show that RP mitigates attribute-level and instance-level inference-stage privacy leakage, but not function-level extraction through model distillation.

翻译：一个观察模型发布预测的对手可以推断被查询输入的敏感属性，甚至重构模型训练数据的代表样本。推理接口因此充当了隐私泄露的侧信道。我们提出鲁棒隐私（Robust Privacy, RP），这是一种受认证鲁棒性启发的推理阶段隐私概念：如果模型在以输入x为中心、半径为R的邻域内其预测具有保序不变性，且置信度至少为1-α，则x享有(R,α)-鲁棒隐私。在此定义下，我们证明任何观察发布预测的对手在区分x与距离x不超过R的任何输入时，优势最多为α/2。基于RP，我们形式化了鲁棒属性隐私（Robust Attribute Privacy, RAP），这是一种属性级隐私概念，刻画了与发布预测保持兼容的敏感属性值集合。在分类任务中，RP将RAP兼容推理区间的中位数长度从23.50提升至29.96，从而降低了属性推断精度。通常被视为训练阶段威胁的模型反演攻击，实际上依赖于通过推理接口泄露的细粒度信号；RP在推理阶段掩蔽这些信号，将黑盒反演攻击的成功率（ASR）从73%降至4%。这种直接针对泄露通道的设计使RP在隐私-效用权衡空间中优于DP-SGD和随机响应：RP在ASR为21%时保留98.4%的准确率，而DP-SGD需将准确率降至61.7%才能达到相近的ASR。两项实验均表明，增加平滑样本量N能同时增强隐私和提升效用。最后，我们检查了模型蒸馏的作用边界，表明RP可缓解属性级和实例级推理阶段隐私泄露，但无法阻止通过模型蒸馏进行的函数级提取。