Data sharing enables critical advances in many research areas and business applications, but it may lead to inadvertent disclosure of sensitive summary statistics (e.g., means or quantiles). Existing literature only focuses on protecting a single confidential quantity, while in practice, data sharing involves multiple sensitive statistics. We propose a novel framework to define, analyze, and protect multi-secret summary statistics privacy in data sharing. Specifically, we measure the privacy risk of any data release mechanism by the worst-case probability of an attacker successfully inferring summary statistic secrets. Given an attacker's objective spanning from inferring a subset to the entirety of summary statistic secrets, we systematically design and analyze tailored privacy metrics. Defining the distortion as the worst-case distance between the original and released data distribution, we analyze the tradeoff between privacy and distortion. Our contribution also includes designing and analyzing data release mechanisms tailored for different data distributions and secret types. Evaluations on real-world data demonstrate the effectiveness of our mechanisms in practical applications.
翻译:数据共享在诸多研究领域与商业应用中推动着关键进展,但可能导致敏感汇总统计量(如均值或分位数)的无意泄露。现有文献仅关注保护单一机密统计量,而实践中数据共享涉及多重敏感统计量。本文提出一种新颖框架,用于定义、分析并保护数据共享中的多秘密汇总统计隐私。具体而言,我们通过攻击者成功推断汇总统计秘密的最坏情况概率来衡量任意数据发布机制的隐私风险。针对攻击者从推断部分到全部汇总统计秘密的不同目标,我们系统性地设计与分析定制化的隐私度量指标。将失真定义为原始数据分布与发布数据分布间的最坏情况距离,我们深入分析隐私与失真间的权衡关系。本研究的贡献还包括针对不同数据分布与秘密类型设计与分析定制化的数据发布机制。在真实数据上的评估验证了我们机制在实际应用中的有效性。