局部差分隐私下相关多属性数据的频率估计 (Frequency Estimation of Correlated Multi-attribute Data under Local Differential Privacy)

Large-scale data collection, from national censuses to IoT-enabled smart homes, routinely gathers dozens of attributes per individual. These multi-attribute datasets are crucial for analytics but pose significant privacy risks. Local Differential Privacy (LDP) is a powerful tool for protecting user privacy by allowing users to locally perturb their records before releasing them to an untrusted data aggregator. However, existing LDP mechanisms either split the privacy budget across all attributes or treat each attribute independently, thereby ignoring natural inter-attribute correlations. This leads to excessive noise and, consequently, significant utility loss, particularly in high-dimensional datasets. We introduce a two-phase LDP framework that overcomes these limitations by privately learning and exploiting inter-attribute dependencies. In Phase~I, a small subset of users applies a standard per-attribute LDP mechanism, enabling the aggregator to derive dependency information from the privatized data. In Phase~II, each remaining user perturbs a single randomly chosen attribute with the full privacy budget, while the unreported attributes are reconstructed using Phase~I statistics, incurring no additional privacy cost. As a concrete instantiation, we develop Correlated Randomized Response (Corr-RR), which employs correlation-aware probabilistic mappings to substantially improve estimation accuracy. We prove that Corr-RR satisfies $ε$-LDP, and demonstrate through extensive experiments on synthetic and real-world datasets that it consistently outperforms state-of-the-art baselines, with the largest gains in high-dimensional and strongly correlated datasets.

翻译：从全国人口普查到物联网智能家居，大规模数据收集通常涉及每个个体的数十个属性。这些多属性数据集对分析至关重要，但也带来了显著的隐私风险。局部差分隐私是一种强大的隐私保护工具，允许用户在将记录发送至不可信的数据聚合器之前，在本地对数据进行扰动。然而，现有的LDP机制要么将隐私预算分摊至所有属性，要么独立处理每个属性，从而忽略了属性间固有的相关性。这导致噪声过度增加，进而造成显著的效用损失，尤其在高维数据集中更为明显。我们提出了一种两阶段LDP框架，通过隐私地学习并利用属性间依赖关系来克服这些限制。在第一阶段，一小部分用户应用标准的单属性LDP机制，使聚合器能够从隐私化数据中推导出依赖信息。在第二阶段，每位剩余用户使用全部隐私预算对随机选择的一个属性进行扰动，而未报告的属性则利用第一阶段的统计信息进行重建，且不产生额外的隐私成本。作为具体实现，我们开发了相关随机响应方法，该方法采用相关性感知的概率映射，显著提升了估计精度。我们证明了Corr-RR满足$ε$-LDP，并通过在合成和真实数据集上的大量实验表明，其性能始终优于现有先进基线方法，且在高维和强相关数据集上提升最为显著。

相关内容

属性

关注 1

一个具体事物，总是有许许多多的性质与关系，我们把一个事物的性质与关系，都叫作事物的属性。事物与属性是不可分的，事物都是有属性的事物，属性也都是事物的属性。一个事物与另一个事物的相同或相异，也就是一个事物的属性与另一事物的属性的相同或相异。由于事物属性的相同或相异，客观世界中就形成了许多不同的事物类。具有相同属性的事物就形成一类，具有不同属性的事物就分别地形成不同的类。

【ICLR2025】为多模态图像-文本表示可解释性缩小信息瓶颈理论

专知会员服务

15+阅读 · 2025年2月24日

【AAAI2025】TimeDP：通过领域提示学习生成多领域时间序列

专知会员服务

14+阅读 · 2025年1月10日

【WSDM2024】数据中的恶魔：通过部分知识蒸馏学习公平的图神经网络

专知会员服务

31+阅读 · 2023年12月1日

【NeurIPS2023】半监督端到端对比学习用于时间序列分类

专知会员服务

36+阅读 · 2023年10月17日