Neural networks are susceptible to privacy attacks. To date, no verifier can reason about the privacy of individuals participating in the training set. We propose a new privacy property, called local differential classification privacy (LDCP), extending local robustness to a differential privacy setting suitable for black-box classifiers. Given a neighborhood of inputs, a classifier is LDCP if it classifies all inputs the same regardless of whether it is trained with the full dataset or whether any single entry is omitted. A naive algorithm is highly impractical because it involves training a very large number of networks and verifying local robustness of the given neighborhood separately for every network. We propose Sphynx, an algorithm that computes an abstraction of all networks, with a high probability, from a small set of networks, and verifies LDCP directly on the abstract network. The challenge is twofold: network parameters do not adhere to a known distribution probability, making it difficult to predict an abstraction, and predicting too large abstraction harms the verification. Our key idea is to transform the parameters into a distribution given by KDE, allowing to keep the over-approximation error small. To verify LDCP, we extend a MILP verifier to analyze an abstract network. Experimental results show that by training only 7% of the networks, Sphynx predicts an abstract network obtaining 93% verification accuracy and reducing the analysis time by $1.7\cdot10^4$x.
翻译:神经网络易受隐私攻击。迄今为止,尚无验证工具能够推断训练集中参与个体的隐私安全性。我们提出一种新的隐私属性——局部差分分类隐私(LDCP),将局部鲁棒性扩展至适用于黑盒分类器的差分隐私设置。给定一个输入邻域,若分类器在完整数据集训练与遗漏任意单个样本的训练场景下对所有输入均产生相同分类结果,则称该分类器满足LDCP。朴素算法因需训练海量网络并为每个网络独立验证给定邻域的局部鲁棒性而极不实用。我们提出Sphynx算法,通过少量网络以高概率计算所有网络的抽象表示,并直接在抽象网络上验证LDCP。这面临双重挑战:网络参数不遵循已知概率分布导致抽象预测困难,且预测过大的抽象表示会损害验证效果。我们的核心思想是通过核密度估计将参数转化为概率分布,从而控制过逼近误差。为验证LDCP,我们扩展混合整数线性规划验证器以分析抽象网络。实验表明,仅需训练7%的网络,Sphynx即可预测出验证准确率达93%的抽象网络,并将分析时间降低至原来的$1.7\times 10^4$分之一。