We consider the problem of sharing correlated data under a perfect information-theoretic privacy constraint. We focus on redaction (erasure) mechanisms, in which data are either withheld or released unchanged, and measure utility by the average cardinality of the released set, equivalently, the expected Hamming distortion. Assuming the data are generated by a finite time-homogeneous Markov chain, we study the protection of the initial state while maximizing the amount of shared data. We establish a connection between perfect privacy and window-based redaction schemes, showing that erasing data up to a strong stationary time preserves privacy under suitable conditions. We further study an optimal sequential redaction mechanism and prove that it admits an equivalent window interpretation. Interestingly, we show that both mechanisms achieve the optimal distortion while redacting only a constant average number of data points, independent of the data length~$N$.
翻译:我们研究在完美信息论隐私约束下共享相关数据的问题。我们专注于删减(擦除)机制,其中数据要么被保留,要么被原样释放,并通过释放集合的平均基数(等价地,期望汉明失真)来衡量效用。假设数据由有限时间齐次马尔可夫链生成,我们研究在最大化共享数据量的同时保护初始状态的问题。我们建立了完美隐私与基于窗口的删减方案之间的联系,表明在适当条件下,擦除数据直至强平稳时间可以保护隐私。我们进一步研究了一种最优顺序删减机制,并证明其具有等价的窗口解释。有趣的是,我们证明这两种机制在仅删减恒定平均数量的数据点(与数据长度~$N$无关)的同时,实现了最优失真。