When a person's records appear in k independent data silos, each protected by (epsilon, delta)-differential privacy, standard composition yields a valid (k*epsilon, k*delta)-DP guarantee for the joint output. This worst-case bound, however, does not answer the concrete inference question: at what k can an adversary actually identify a target person? This paper develops the information-theoretic framework needed to answer that question. We introduce cross-silo person-level DP (XSP-DP), a Pufferfish-style privacy notion whose adjacency relation captures all records of a single person across all silos simultaneously, and verify that the standard basic composition bound carries over to this adjacency model. Within this framework we prove that de-anonymization undergoes a phase transition at k* = Theta(log n / epsilon^2) (population size n, per-silo RR parameter epsilon): a Fano lower bound shows any estimator fails for k << k*, while a matching maximum-likelihood upper bound shows the attack succeeds for k >> k*. An explicit XOR + randomized-response construction demonstrates information synergy: each silo's output is individually uninformative about the target, yet the joint mutual information is strictly positive. For non-coordinated binary randomized-response mechanisms, we prove that de-anonymization is inevitable once k exceeds the threshold, establishing that cross-silo coordination is necessary. These results provide a baseline threat model and Theta-level threshold for cross-silo inference attacks under local DP.
翻译:当一个人的记录出现在k个独立数据筒仓中,每个筒仓受(epsilon, delta)-差分隐私保护时,标准组合方法对联合输出提供有效的(k*epsilon, k*delta)-DP保证。然而,这一最坏情况边界并未回答具体的推断问题:在何种k值下,攻击者能实际识别出目标个体?本文建立了回答该问题所需的信息论框架。我们提出跨筒仓人员级DP(XSP-DP)——一种Pufferfish风格的隐私概念,其邻接关系同时捕获单个人员在所有筒仓中的全部记录,并验证标准基本组合边界可迁移至该邻接模型。在该框架内,我们证明去匿名化在k* = Theta(log n / epsilon^2)处经历相变(群体规模n,每个筒仓RR参数epsilon):Fano下界表明当k << k*时任何估计器均失效,而匹配的最大似然上界显示当k >> k*时攻击成功。一个显式的XOR加随机响应构造展示了信息协同效应:各筒仓输出单独关于目标信息量小,但联合互信息严格为正。对于非协调的二进制随机响应机制,我们证明一旦k超过阈值,去匿名化不可避免,从而确立了跨筒仓协调的必要性。这些结果为本地DP下的跨筒仓推断攻击提供了基准威胁模型和Theta级阈值。