Linear regression is a fundamental tool for statistical analysis. This has motivated the development of linear regression methods that also satisfy differential privacy and thus guarantee that the learned model reveals little about any one data point used to construct it. However, existing differentially private solutions assume that the end user can easily specify good data bounds and hyperparameters. Both present significant practical obstacles. In this paper, we study an algorithm which uses the exponential mechanism to select a model with high Tukey depth from a collection of non-private regression models. Given $n$ samples of $d$-dimensional data used to train $m$ models, we construct an efficient analogue using an approximate Tukey depth that runs in time $O(d^2n + dm\log(m))$. We find that this algorithm obtains strong empirical performance in the data-rich setting with no data bounds or hyperparameter selection required.
翻译:线性回归是统计分析的基本工具。这推动了同时满足差分隐私的线性回归方法的发展,从而保证学习到的模型几乎不会泄露用于构建它的任何数据点的信息。然而,现有差分隐私解决方案假设最终用户能够轻松指定良好的数据边界和超参数,这两者在实际应用中均构成重大障碍。本文研究了一种算法,该算法通过指数机制从一组非隐私回归模型中选取具有高图基深度的模型。给定用于训练 $m$ 个模型的 $d$ 维数据的 $n$ 个样本,我们利用近似图基深度构建了一个高效版本,其运行时间为 $O(d^2n + dm\log(m))$。我们发现,在无需数据边界或超参数选择的数据丰富场景中,该算法表现出强大的实证性能。