The fundamental trade-off between privacy and utility remains an active area of research. Our contribution is motivated by two observations. First, privacy mechanisms developed for one-time data release cannot straightforwardly be extended to sequential releases. Second, practical databases are likely to be useful to multiple distinct parties. Furthermore, we can not rule out the possibility of data sharing between parties. With utility in mind, we formulate a privacy-utility trade-off problem to adaptively tackle sequential data requests made by different, potentially colluding entities. We consider both expected distortion and mutual information as measures to quantify utility, and use mutual information to measure privacy. We assume an attack model whereby illicit data sharing, which we call collusion, can occur between data receivers. We develop an adaptive algorithm for data releases that makes use of a modified Blahut-Arimoto algorithm. We show that the resulting data releases are optimal when expected distortion quantifies utility, and locally optimal when mutual information quantifies utility. Finally, we discuss how our findings may extend to applications in machine learning.
翻译:隐私与效用之间的基本权衡仍然是研究的热点领域。我们的研究动机源于两点观察。首先,为一次性数据发布设计的隐私保护机制无法直接扩展到序列发布场景。其次,实际数据库往往需要服务于多个不同的使用方。此外,我们无法排除使用方之间共享数据的可能性。基于效用考量,我们构建了一个隐私-效用权衡问题,以自适应地处理由不同实体(可能存在合谋关系)提出的序列数据请求。我们同时采用期望失真和互信息作为量化效用的指标,并使用互信息来衡量隐私水平。我们假设存在一种攻击模型,其中数据接收方之间可能发生非法数据共享(我们称之为合谋)。我们开发了一种自适应数据发布算法,该算法采用了改进的Blahut-Arimoto算法。我们证明,当以期望失真量化效用时,所得数据发布方案具有最优性;当以互信息量化效用时,该方案具有局部最优性。最后,我们探讨了本研究成果在机器学习领域的潜在应用前景。