Researchers increasingly use data on social and economic networks to study a range of social science questions, but releasing statistics derived from networks can raise significant privacy concerns. We show how to release network connectedness indices that quantify assortative mixing across node attributes under edge-adjacent differential privacy. Standard privacy techniques perform poorly in this setting both because connectedness indices have high global sensitivity and because a single node's attribute can potentially be an input to connectedness in thousands of cells, leading to poor composition. Our method, which is straightforward to apply, first adds noise to node attributes, then analytically debiases downstream statistics, and finally applies a second layer of noise to protect the presence or absence of individual edges. We prove consistency and asymptotic normality of our estimators for both discrete and continuous labels and show our method works well in simulations and on real networks with as few as 200 nodes collected by social scientists.
翻译:研究人员越来越多地利用社会经济网络数据来研究各类社会科学问题,但发布基于网络推导的统计量可能引发严重的隐私问题。本文展示了如何在边邻接差分隐私框架下发布量化节点属性间同配性混合的网络连通性指标。标准隐私技术在此场景下表现不佳,这既因为连通性指标具有较高的全局敏感度,也由于单个节点的属性可能作为数千个计算单元的输入,导致隐私组合效果较差。我们提出的方法易于实施:首先对节点属性添加噪声,随后通过解析方法消除下游统计量的偏差,最后施加第二层噪声以保护个体边存在与否的隐私。我们证明了该方法在离散和连续标签情况下估计量的一致性与渐近正态性,并通过仿真实验和真实网络数据验证了该方法在社交科学家收集的少至200个节点的网络中仍能保持良好性能。