To the best of our knowledge, there are no methods today for training differentially private regression models on sparse input data. To remedy this, we adapt the Frank-Wolfe algorithm for $L_1$ penalized linear regression to be aware of sparse inputs and to use them effectively. In doing so, we reduce the training time of the algorithm from $\mathcal{O}( T D S + T N S)$ to $\mathcal{O}(N S + T \sqrt{D} \log{D} + T S^2)$, where $T$ is the number of iterations and a sparsity rate $S$ of a dataset with $N$ rows and $D$ features. Our results demonstrate that this procedure can reduce runtime by a factor of up to $2,200\times$, depending on the value of the privacy parameter $\epsilon$ and the sparsity of the dataset.
翻译:据我们所知,目前尚无针对稀疏输入数据训练差分隐私回归模型的方法。为解决这一问题,我们对面向$L_1$惩罚线性回归的Frank-Wolfe算法进行适配,使其能够感知并有效利用稀疏输入。通过该改进,我们将算法训练时间从$\mathcal{O}( T D S + T N S)$降至$\mathcal{O}(N S + T \sqrt{D} \log{D} + T S^2)$,其中$T$为迭代次数,$S$为具有$N$行$D$个特征数据集的稀疏率。实验结果表明,根据隐私参数$\epsilon$的取值与数据集的稀疏程度,该方法可将运行时间降低至原时的$1/2200$倍。