Directly releasing those data raises privacy and liability (e.g., due to unauthorized distribution of such datasets) concerns since location data contain users' sensitive information, e.g., regular moving patterns and favorite spots. To address this, we propose a novel fingerprinting scheme that simultaneously identifies unauthorized redistribution of location datasets and provides differential privacy guarantees for the shared data. Observing data utility degradation due to differentially-private mechanisms, we introduce a utility-focused post-processing scheme to regain spatio-temporal correlations between points in a location trajectory. We further integrate this post-processing scheme into our fingerprinting scheme as a sampling method. The proposed fingerprinting scheme alleviates the degradation in the utility of the shared dataset due to the noise introduced by differentially-private mechanisms (i.e., adds the fingerprint by preserving the publicly known statistics of the data). Meanwhile, it does not violate differential privacy throughout the entire process due to immunity to post-processing, a fundamental property of differential privacy. Our proposed fingerprinting scheme is robust against known and well-studied attacks against a fingerprinting scheme including random flipping attacks, correlation-based flipping attacks, and collusions among multiple parties, which makes it hard for the attackers to infer the fingerprint codes and avoid accusation. Via experiments on two real-life location datasets and two synthetic ones, we show that our scheme achieves high fingerprinting robustness and outperforms existing approaches. Besides, the proposed fingerprinting scheme increases data utility for differentially-private datasets, which is beneficial for data analyzers.
翻译:直接发布这些数据会引发隐私和责任问题(例如,由于此类数据集的未授权分发),因为位置数据包含用户的敏感信息,如常规移动模式和偏好地点。为解决这一问题,我们提出了一种新颖的指纹识别方案,该方案能同时识别位置数据集的未授权再分发,并为共享数据提供差分隐私保证。观察到差分隐私机制导致的数据效用下降,我们引入了一种以效用为中心的后期处理方案,以恢复位置轨迹中数据点之间的时空相关性。我们进一步将该后期处理方案作为采样方法集成到我们的指纹识别方案中。所提出的指纹识别方案减轻了因差分隐私机制引入的噪声而导致的共享数据集效用退化(即,通过保留数据的公开已知统计信息来添加指纹)。同时,由于差分隐私的一个基本属性——对后期处理的免疫性,整个过程并未违反差分隐私。我们的指纹识别方案对已知且经过充分研究的攻击(包括随机翻转攻击、基于相关性的翻转攻击以及多方合谋)具有鲁棒性,这使攻击者难以推断指纹代码并避免被追责。通过对两个真实位置数据集和两个合成数据集的实验,我们证明了该方案实现了高指纹鲁棒性,并优于现有方法。此外,所提出的指纹识别方案提升了差分隐私数据集的数据效用,这对数据分析者是有益的。