Estimating spatial distributions is important in data analysis, such as traffic flow forecasting and epidemic prevention. To achieve accurate spatial distribution estimation, the analysis needs to collect sufficient user data. However, collecting data directly from individuals could compromise their privacy. Most previous works focused on private distribution estimation for one-dimensional data, which does not consider spatial data relation and leads to poor accuracy for spatial distribution estimation. In this paper, we address the problem of private spatial distribution estimation, where we collect spatial data from individuals and aim to minimize the distance between the actual distribution and estimated one under Local Differential Privacy (LDP). To leverage the numerical nature of the domain, we project spatial data and its relationships onto a one-dimensional distribution. We then use this projection to estimate the overall spatial distribution. Specifically, we propose a reporting mechanism called Disk Area Mechanism (DAM), which projects the spatial domain onto a line and optimizes the estimation using the sliced Wasserstein distance. Through extensive experiments, we show the effectiveness of our DAM approach on both real and synthetic data sets, compared with the state-of-the-art methods, such as Multi-dimensional Square Wave Mechanism (MDSW) and Subset Exponential Mechanism with Geo-I (SEM-Geo-I). Our results show that our DAM always performs better than MDSW and is better than SEM-Geo-I when the data granularity is fine enough.
翻译:空间分布估计在数据分析中具有重要意义,例如交通流量预测和疫情防控。为实现准确的空间分布估计,分析过程需要收集足够的用户数据。然而,直接收集个体数据可能损害其隐私。以往研究大多关注一维数据的隐私分布估计,未考虑空间数据关联性,导致空间分布估计精度不足。本文研究隐私空间分布估计问题,即在本地差分隐私(LDP)框架下收集个体空间数据,旨在最小化实际分布与估计分布之间的距离。为利用数值域特性,我们将空间数据及其关系投影至一维分布,进而通过该投影估计整体空间分布。具体而言,我们提出一种称为圆盘区域机制(DAM)的报告机制,该机制将空间域投影至直线,并利用切片Wasserstein距离优化估计精度。通过大量实验,我们在真实与合成数据集上验证了DAM方法的有效性,并与多维方波机制(MDSW)、基于Geo-I的子集指数机制(SEM-Geo-I)等先进方法进行对比。实验结果表明:DAM始终优于MDSW;当数据粒度足够精细时,DAM亦优于SEM-Geo-I。