Decaf: Data Distribution Decompose Attack against Federated Learning

In contrast to prevalent Federated Learning (FL) privacy inference techniques such as generative adversarial networks attacks, membership inference attacks, property inference attacks, and model inversion attacks, we devise an innovative privacy threat: the Data Distribution Decompose Attack on FL, termed Decaf. This attack enables an honest-but-curious FL server to meticulously profile the proportion of each class owned by the victim FL user, divulging sensitive information like local market item distribution and business competitiveness. The crux of Decaf lies in the profound observation that the magnitude of local model gradient changes closely mirrors the underlying data distribution, including the proportion of each class. Decaf addresses two crucial challenges: accurately identify the missing/null class(es) given by any victim user as a premise and then quantify the precise relationship between gradient changes and each remaining non-null class. Notably, Decaf operates stealthily, rendering it entirely passive and undetectable to victim users regarding the infringement of their data distribution privacy. Experimental validation on five benchmark datasets (MNIST, FASHION-MNIST, CIFAR-10, FER-2013, and SkinCancer) employing diverse model architectures, including customized convolutional networks, standardized VGG16, and ResNet18, demonstrates Decaf's efficacy. Results indicate its ability to accurately decompose local user data distribution, regardless of whether it is IID or non-IID distributed. Specifically, the dissimilarity measured using $L_{\infty}$ distance between the distribution decomposed by Decaf and ground truth is consistently below 5\% when no null classes exist. Moreover, Decaf achieves 100\% accuracy in determining any victim user's null classes, validated through formal proof.

翻译：与当前主流的联邦学习隐私推断技术（如生成对抗网络攻击、成员推断攻击、属性推断攻击和模型反演攻击）不同，我们设计了一种创新的隐私威胁：针对联邦学习的数据分布分解攻击，命名为Decaf。该攻击使得一个诚实但好奇的联邦学习服务器能够精细刻画受害联邦学习用户所拥有的每一类数据的比例，从而泄露诸如本地市场商品分布和商业竞争力等敏感信息。Decaf的核心在于一个深刻的观察：本地模型梯度变化的幅度紧密反映了底层数据分布，包括每一类的比例。Decaf解决了两个关键挑战：首先准确识别任意受害用户所缺失/空类作为前提，然后量化梯度变化与每个剩余非空类之间的精确关系。值得注意的是，Decaf以隐蔽方式运行，使其对受害用户而言完全被动且不可检测，从而侵犯了其数据分布隐私。在五个基准数据集（MNIST、FASHION-MNIST、CIFAR-10、FER-2013和SkinCancer）上，采用多样化模型架构（包括定制的卷积网络、标准化的VGG16和ResNet18）进行的实验验证证明了Decaf的有效性。结果表明，无论数据是独立同分布还是非独立同分布，Decaf均能准确分解本地用户的数据分布。具体而言，当不存在空类时，Decaf分解的分布与真实分布之间使用 $L_{\infty}$ 距离测量的差异始终低于5%。此外，Decaf在确定任意受害用户的空类方面达到了100%的准确率，并通过形式化证明得到验证。