This paper explores the use of affine hulls of points as a means of representing data via learning in Reproducing Kernel Hilbert Spaces (RKHS), with the goal of partitioning the data space into geometric bodies that conceal privacy-sensitive information about individual data points, while preserving the structure of the original learning problem. To this end, we introduce the Kernel Affine Hull Machine (KAHM), which provides an effective way of computing a distance measure from the resulting bounded geometric body. KAHM is a critical building block in wide and deep autoencoders, which enable data representation learning for classification applications. To ensure privacy-preserving learning, we propose a novel method for generating fabricated data, which involves smoothing differentially private data samples through a transformation process. The resulting fabricated data guarantees not only differential privacy but also ensures that the KAHM modeling error is not larger than that of the original training data samples. We also address the accuracy-loss issue that arises with differentially private classifiers by using fabricated data. This approach results in a significant reduction in the risk of membership inference attacks while incurring only a marginal loss of accuracy. As an application, a KAHM based differentially private federated learning scheme is introduced featuring that the evaluation of global classifier requires only locally computed distance measures. Overall, our findings demonstrate the potential of KAHM as effective tool for privacy-preserving learning and classification.
翻译:本文探讨利用再生核希尔伯特空间(RKHS)中学习得到的点仿射包来表示数据,旨在将数据空间划分为能够隐藏个体数据点隐私敏感信息,同时保留原始学习问题结构的几何体。为此,我们提出了核仿射包机(KAHM),它提供了一种从所生成的有界几何体计算距离度量的有效方法。KAHM是宽深自编码器中的关键构建模块,支持分类应用中的数据表示学习。为确保隐私保护学习,我们提出了一种生成虚假数据的新方法,该方法通过变换过程对差分隐私数据样本进行平滑处理。生成的虚假数据不仅保证差分隐私,还能确保KAHM建模误差不大于原始训练数据样本的误差。我们还通过使用虚假数据解决了差分隐私分类器所面临的精度损失问题。该方法在仅产生微小幅度的精度损失的同时,显著降低了成员推断攻击的风险。作为应用,我们引入了一种基于KAHM的差分隐私联邦学习方案,其特点是全局分类器的评估仅需局部计算的距离度量。总体而言,我们的研究结果表明KAHM作为隐私保护学习与分类的有效工具具有潜力。