Training data privacy is a fundamental problem in modern Artificial Intelligence (AI) applications, such as face recognition, recommendation systems, language generation, and many others, as it may contain sensitive user information related to legal issues. To fundamentally understand how privacy mechanisms work in AI applications, we study differential privacy (DP) in the Neural Tangent Kernel (NTK) regression setting, where DP is one of the most powerful tools for measuring privacy under statistical learning, and NTK is one of the most popular analysis frameworks for studying the learning mechanisms of deep neural networks. In our work, we can show provable guarantees for both differential privacy and test accuracy of our NTK regression. Furthermore, we conduct experiments on the basic image classification dataset CIFAR10 to demonstrate that NTK regression can preserve good accuracy under a modest privacy budget, supporting the validity of our analysis. To our knowledge, this is the first work to provide a DP guarantee for NTK regression.
翻译:训练数据隐私是现代人工智能应用中的一个基本问题,例如人脸识别、推荐系统、语言生成等,因为这些数据可能包含与法律问题相关的敏感用户信息。为了从根本上理解隐私机制在人工智能应用中的工作原理,我们在神经正切核回归设置下研究差分隐私,其中差分隐私是统计学习中最强大的隐私衡量工具之一,而神经正切核是研究深度神经网络学习机制最流行的分析框架之一。在我们的工作中,我们能够为神经正切核回归的差分隐私和测试精度提供可证明的保证。此外,我们在基础图像分类数据集CIFAR10上进行了实验,证明神经正切核回归在适度的隐私预算下能够保持良好的准确性,从而支持我们分析的有效性。据我们所知,这是首个为神经正切核回归提供差分隐私保证的工作。