A private learner is trained on a sample of labeled points and generates a hypothesis that can be used for predicting the labels of newly sampled points while protecting the privacy of the training set [Kasiviswannathan et al., FOCS 2008]. Research uncovered that private learners may need to exhibit significantly higher sample complexity than non-private learners as is the case with, e.g., learning of one-dimensional threshold functions [Bun et al., FOCS 2015, Alon et al., STOC 2019]. We explore prediction as an alternative to learning. Instead of putting forward a hypothesis, a predictor answers a stream of classification queries. Earlier work has considered a private prediction model with just a single classification query [Dwork and Feldman, COLT 2018]. We observe that when answering a stream of queries, a predictor must modify the hypothesis it uses over time, and, furthermore, that it must use the queries for this modification, hence introducing potential privacy risks with respect to the queries themselves. We introduce private everlasting prediction taking into account the privacy of both the training set and the (adaptively chosen) queries made to the predictor. We then present a generic construction of private everlasting predictors in the PAC model. The sample complexity of the initial training sample in our construction is quadratic (up to polylog factors) in the VC dimension of the concept class. Our construction allows prediction for all concept classes with finite VC dimension, and in particular threshold functions with constant size initial training sample, even when considered over infinite domains, whereas it is known that the sample complexity of privately learning threshold functions must grow as a function of the domain size and hence is impossible for infinite domains.
翻译:私人学习器通过标记样本集进行训练,生成可用于预测新样本标签的假设,同时保护训练集的隐私性[Kasiviswannathan等,FOCS 2008]。研究发现,私人学习器可能比非私人学习器需要显著更高的样本复杂度,例如在一维阈值函数的学习中即存在此类情况[Bun等,FOCS 2015,Alon等,STOC 2019]。我们探索预测作为学习的替代方案。预测器无需提出假设,而是回答一系列分类查询。先前的工作已考虑仅包含单一分类查询的私有预测模型[Dwork和Feldman,COLT 2018]。我们观察到,在回答查询流时,预测器必须随时间调整其使用的假设,并且必须利用这些查询进行这种调整,从而引入与查询本身相关的潜在隐私风险。我们引入私有永续预测,同时考虑训练集和(自适应选择的)预测器查询的隐私性。随后,我们在PAC模型中提出私有永续预测器的通用构造。在该构造中,初始训练样本的样本复杂度与概念类的VC维成二次关系(至多对数因子)。该构造允许对所有具有有限VC维的概念类进行预测,特别是阈值函数,即使是在无限域上考虑时,其初始训练样本规模也可保持常数,而已知私人学习阈值函数的样本复杂度必须随域大小增长,因此在无限域上不可行。