Gaussian Mixture Models (GMMs) are widely used statistical models for representing multi-modal data distributions, with numerous applications in data mining, pattern recognition, data simulation, and machine learning. However, recent research has shown that releasing GMM parameters poses significant privacy risks, potentially exposing sensitive information about the underlying data. In this paper, we address the challenge of releasing GMM parameters while ensuring differential privacy (DP) guarantees. Specifically, we focus on the privacy protection of mixture weights, component means, and covariance matrices. We propose to use Kullback-Leibler (KL) divergence as a utility metric to assess the accuracy of the released GMM, as it captures the joint impact of noise perturbation on all the model parameters. To achieve privacy, we introduce a DP mechanism that adds carefully calibrated random perturbations to the GMM parameters. Through theoretical analysis, we quantify the effects of privacy budget allocation and perturbation statistics on the DP guarantee, and derive a tractable expression for evaluating KL divergence. We formulate and solve an optimization problem to minimize the KL divergence between the released and original models, subject to a given $(ε, δ)$-DP constraint. Extensive experiments on both synthetic and real-world datasets demonstrate that our approach achieves strong privacy guarantees while maintaining high utility.
翻译:高斯混合模型(GMM)是广泛应用于表示多模态数据分布的统计模型,在数据挖掘、模式识别、数据模拟和机器学习等领域具有众多应用。然而,近期研究表明,发布GMM参数会带来显著的隐私风险,可能泄露底层数据的敏感信息。本文旨在解决在确保差分隐私(DP)保证的前提下发布GMM参数的挑战性问题。具体而言,我们聚焦于混合权重、分量均值和协方差矩阵的隐私保护。我们提出使用Kullback-Leibler(KL)散度作为效用度量来评估所发布GMM的准确性,因为该指标能够捕获噪声扰动对所有模型参数的综合影响。为实现隐私保护,我们设计了一种差分隐私机制,通过向GMM参数添加经过精心校准的随机扰动。通过理论分析,我们量化了隐私预算分配和扰动统计量对差分隐私保证的影响,并推导出用于评估KL散度的易处理表达式。我们构建并求解了一个优化问题,在给定$(ε, δ)$-DP约束下,最小化发布模型与原始模型之间的KL散度。在合成数据集和真实数据集上的大量实验表明,我们的方法在保持高可用性的同时实现了强有力的隐私保证。