Data privacy is important in the AI era, and differential privacy (DP) is one of the golden solutions. However, DP is typically applicable only if data have a bounded underlying distribution. We address this limitation by leveraging second-moment information from a small amount of public data. We propose Public-moment-guided Truncation (PMT), which transforms private data using the public second-moment matrix and applies a principled truncation whose radius depends only on non-private quantities: data dimension and sample size. This transformation yields a well-conditioned second-moment matrix, enabling its inversion with a significantly strengthened ability to resist the DP noise. Furthermore, we demonstrate the applicability of PMT by using penalized and generalized linear regressions. Specifically, we design new loss functions and algorithms, ensuring that solutions in the transformed space can be mapped back to the original domain. We have established improvements in the models' DP estimation through theoretical error bounds, robustness guarantees, and convergence results, attributing the gains to the conditioning effect of PMT. Experiments on synthetic and real datasets confirm that PMT substantially improves the accuracy and stability of DP models.
翻译:在人工智能时代,数据隐私至关重要,而差分隐私(DP)是黄金解决方案之一。然而,差分隐私通常仅适用于具有有界基础分布的数据。我们通过利用少量公共数据的二阶矩信息来解决这一局限性。我们提出了公共矩引导截断(PMT)方法,该方法利用公共二阶矩矩阵对私有数据进行变换,并应用一种截断半径仅依赖于非私有量(数据维度和样本量)的原则性截断。这种变换产生了一个条件良好的二阶矩矩阵,使其在抵抗差分隐私噪声的能力显著增强的情况下仍可求逆。此外,我们通过惩罚回归和广义线性回归证明了PMT的适用性。具体而言,我们设计了新的损失函数和算法,确保变换空间中的解能够映射回原始域。我们通过理论误差界、鲁棒性保证和收敛结果,确立了模型在差分隐私估计方面的改进,并将这些增益归因于PMT的条件化效应。在合成和真实数据集上的实验证实,PMT显著提高了差分隐私模型的准确性和稳定性。