Differential privacy provides a strong form of privacy and allows preserving most of the original characteristics of the dataset. Utilizing these benefits requires one to design specific differentially private data analysis algorithms. In this work, we present three tree-based algorithms for mining redescriptions while preserving differential privacy. Redescription mining is an exploratory data analysis method for finding connections between two views over the same entities, such as phenotypes and genotypes of medical patients, for example. It has applications in many fields, including some, like health care informatics, where privacy-preserving access to data is desired. Our algorithms are the first differentially private redescription mining algorithms, and we show via experiments that, despite the inherent noise in differential privacy, it can return trustworthy results even in smaller datasets where noise typically has a stronger effect.
翻译:差分隐私提供了一种强大的隐私保护形式,同时能保留数据集的大部分原始特征。要利用这些优势,需要设计特定的差分隐私数据分析算法。本文提出了三种基于树的算法,用于在保护差分隐私的同时挖掘重描述。重描述挖掘是一种探索性数据分析方法,旨在发现同一实体(例如医疗患者的表型和基因型)的两个视图之间的联系。该方法在多个领域有应用,包括医疗健康信息学等需要隐私保护数据访问的场景。我们的算法是首批差分隐私重描述挖掘算法,实验表明,尽管差分隐私中固有噪声存在,即使在噪声效应通常更强的小型数据集上,该算法也能返回可信的结果。