We consider the problem of sharing correlated data under a perfect information-theoretic privacy constraint. We focus on redaction (erasure) mechanisms, in which data are either withheld or released unchanged, and measure utility by the average cardinality of the released set, equivalently, the expected Hamming distortion. Assuming the data are generated by a finite time-homogeneous Markov chain, we study the protection of the initial state while maximizing the amount of shared data. We establish a connection between perfect privacy and window-based redaction schemes, showing that erasing data up to a strong stationary time preserves privacy under suitable conditions. We further study an optimal sequential redaction mechanism and prove that it admits an equivalent window interpretation. Interestingly, we show that both mechanisms achieve the optimal distortion while redacting only a constant average number of data points, independent of the data length~$N$.
翻译:我们研究了在完美信息论隐私约束下共享相关数据的问题。重点关注删节(擦除)机制,即数据要么被保留要么被原样释放,并通过释放集的平均基数(等价于期望汉明失真)来衡量效用。假设数据由有限时间齐次马尔可夫链生成,我们研究了在最大化共享数据量的同时保护初始状态的问题。我们建立了完美隐私与基于窗口的删节方案之间的联系,表明在适当条件下,擦除直至强平稳时间的数据能够保护隐私。我们进一步研究了一种最优序贯删节机制,并证明其具有等价的窗口解释。有趣的是,我们表明这两种机制在仅删节常数个平均数据点(与数据长度~$N$无关)的情况下即可实现最优失真。