The issue of ensuring privacy for users who share their personal information has been a growing priority in a business and scientific environment where the use of different types of data and the laws that protect it have increased in tandem. Different technologies have been widely developed for static publications, i.e., where the information is published only once, such as k-anonymity and {\epsilon}-differential privacy. In the case where microdata information is published dynamically, although established notions such as m-invariance and {\tau}-safety already exist, developments for improving utility remain superficial. We propose a new heuristic approach for the NP-hard combinatorial problem of m-invariance and {\tau}-safety, which is based on a mathematical optimization column generation scheme. The quality of a solution to m-invariance and {\tau}-safety can be measured by the Information Loss (IL), a value in [0,100], the closer to 0 the better. We show that our approach improves by far current heuristics, providing in some instances solutions with ILs of 1.87, 8.5 and 1.93, while the state-of-the art methods reported ILs of 39.03, 51.84 and 57.97, respectively.
翻译:在商业和科学环境中,各类数据的使用及其保护法律同步增长,确保共享个人信息的用户隐私已成为日益重要的优先事项。针对静态发布(即信息仅发布一次)的场景,已广泛开发了诸如k-匿名性和{\epsilon}-差分隐私等不同技术。在微数据信息动态发布的情况下,尽管已有m-不变性和{\tau}-安全性等成熟概念,但提升效用的研究仍较为浅显。我们针对m-不变性和{\tau}-安全性这一NP难组合问题提出了一种新的启发式方法,该方法基于数学优化的列生成方案。m-不变性和{\tau}-安全性解的质量可通过信息损失(IL)衡量,其取值范围为[0,100],数值越接近0表示效果越好。实验表明,我们的方法大幅优于现有启发式算法,在某些实例中IL值分别达到1.87、8.5和1.93,而现有最优方法的IL值分别为39.03、51.84和57.97。