Publishing streaming data in a privacy-preserving manner has been a key research focus for many years. This issue presents considerable challenges, particularly due to the correlations prevalent within the data stream. Existing approaches either fall short in effectively leveraging these correlations, leading to a suboptimal utility-privacy tradeoff, or they involve complex mechanism designs that increase the computation complexity with respect to the sequence length. In this paper, we introduce Sequence Information Privacy (SIP), a new privacy notion designed to guarantee privacy for an entire data stream, taking into account the intrinsic data correlations. We show that SIP provides a similar level of privacy guarantee compared to local differential privacy (LDP), and it also enjoys a lightweight modular mechanism design. We further study two online data release models (instantaneous or batched) and propose corresponding privacy-preserving data perturbation mechanisms. We provide a numerical evaluation of how correlations influence noise addition in data streams. Lastly, we conduct experiments using real-world data to compare the utility-privacy tradeoff offered by our approaches with those from existing literature. The results reveal that our mechanisms offer utility improvements more than twice those based on LDP-based mechanisms.
翻译:以隐私保护方式发布流数据多年来一直是研究重点。该问题面临显著挑战,特别是由于数据流中普遍存在的相关性。现有方法要么未能有效利用这些相关性,导致效用-隐私权衡次优,要么采用复杂的机制设计,增加了与序列长度相关的计算复杂度。本文提出序列信息隐私(SIP),这是一种新的隐私概念,旨在考虑内在数据相关性的同时,保障整个数据流的隐私。我们证明,SIP相较于本地差分隐私(LDP)能提供相似的隐私保护水平,并享有轻量级模块化机制设计。我们进一步研究了两种在线数据发布模型(即时或批量),并提出了相应的隐私保护数据扰动机制。我们提供了相关性如何影响数据流中噪声添加的数值评估。最后,我们使用真实数据开展实验,将我们的方法所提供的效用-隐私权衡与现有文献中的方法进行对比。结果显示,我们的机制带来的效用提升比基于LDP的机制高出两倍以上。