Due to statistical lower bounds on the learnability of many function classes under privacy constraints, there has been recent interest in leveraging public data to improve the performance of private learning algorithms. In this model, algorithms must always guarantee differential privacy with respect to the private samples while also ensuring learning guarantees when the private data distribution is sufficiently close to that of the public data. Previous work has demonstrated that when sufficient public, unlabelled data is available, private learning can be made statistically tractable, but the resulting algorithms have all been computationally inefficient. In this work, we present the first computationally efficient, algorithms to provably leverage public data to learn privately whenever a function class is learnable non-privately, where our notion of computational efficiency is with respect to the number of calls to an optimization oracle for the function class. In addition to this general result, we provide specialized algorithms with improved sample complexities in the special cases when the function class is convex or when the task is binary classification.
翻译:由于在隐私约束下许多函数类别的可学习性存在统计下界,近期研究开始关注利用公共数据提升隐私学习算法的性能。在此模型中,算法必须在保证针对私有样本的差分隐私的同时,确保当私有数据分布与公共数据分布足够接近时仍能实现学习保证。先前工作表明,当拥有足够多的无标签公共数据时,隐私学习可在统计上变得可处理,但由此产生的算法均存在计算效率低下的问题。本文首次提出计算高效的算法,在函数类别可实现非隐私学习的前提下,能够利用公共数据可靠地实现隐私学习,其中计算效率的定义基于对函数类别优化预言机的调用次数。除这一通用结果外,我们针对函数类别为凸函数或任务为二分类的特殊情形,提供了具有更优样本复杂度的专用算法。