In this paper we revisit the classical problem of classification, but impose privacy constraints. Under such constraints, the raw data $(X_1,Y_1),\ldots,(X_n,Y_n)$ cannot be directly observed, and all classifiers are functions of the randomised outcome of a suitable local differential privacy mechanism. The statistician is free to choose the form of this privacy mechanism, and here we add Laplace distributed noise to a discretisation of the location of each feature vector $X_i$ and to its label $Y_i$. The classification rule is the privatized version of the well-studied partitioning classification rule. In addition to the standard Lipschitz and margin conditions, a novel characteristic is introduced, by which the exact rate of convergence of the classification error probability is calculated, both for non-private and private data.
翻译:本文重新审视了经典的分类问题,但引入了隐私约束。在此类约束下,原始数据$(X_1,Y_1),\ldots,(X_n,Y_n)$无法被直接观测,所有分类器均为适当局部差分隐私机制的随机化输出的函数。统计学家可自由选择该隐私机制的具体形式,本文向每个特征向量$X_i$位置及其标签$Y_i$的离散化结果添加拉普拉斯分布噪声。分类规则采用经典划分分类规则的隐私化版本。除标准的Lipschitz条件与边际条件外,本文引入了一个新的特征,据此计算了非隐私数据与隐私数据下分类错误概率的精确收敛速率。