In this paper, we revisit the problem of sparse linear regression in the local differential privacy (LDP) model. Existing research in the non-interactive and sequentially local models has focused on obtaining the lower bounds for the case where the underlying parameter is $1$-sparse, and extending such bounds to the more general $k$-sparse case has proven to be challenging. Moreover, it is unclear whether efficient non-interactive LDP (NLDP) algorithms exist. To address these issues, we first consider the problem in the $\epsilon$ non-interactive LDP model and provide a lower bound of $\Omega(\frac{\sqrt{dk\log d}}{\sqrt{n}\epsilon})$ on the $\ell_2$-norm estimation error for sub-Gaussian data, where $n$ is the sample size and $d$ is the dimension of the space. We propose an innovative NLDP algorithm, the very first of its kind for the problem. As a remarkable outcome, this algorithm also yields a novel and highly efficient estimator as a valuable by-product. Our algorithm achieves an upper bound of $\tilde{O}({\frac{d\sqrt{k}}{\sqrt{n}\epsilon}})$ for the estimation error when the data is sub-Gaussian, which can be further improved by a factor of $O(\sqrt{d})$ if the server has additional public but unlabeled data. For the sequentially interactive LDP model, we show a similar lower bound of $\Omega({\frac{\sqrt{dk}}{\sqrt{n}\epsilon}})$. As for the upper bound, we rectify a previous method and show that it is possible to achieve a bound of $\tilde{O}(\frac{k\sqrt{d}}{\sqrt{n}\epsilon})$. Our findings reveal fundamental differences between the non-private case, central DP model, and local DP model in the sparse linear regression problem.
翻译:本文重新审视了本地差分隐私模型中的稀疏线性回归问题。现有针对非交互式与顺序局部模型的研究主要关注于底层参数为 $1$-稀疏情形的下界推导,而要将这类下界扩展到更一般的 $k$-稀疏情形被证明极具挑战性。此外,高效的非交互式本地差分隐私算法是否存在仍不明确。为解决上述问题,我们首先在 $\epsilon$ 非交互式本地差分隐私模型下展开研究,针对次高斯数据给出了 $\ell_2$ 范数估计误差的下界 $\Omega(\frac{\sqrt{dk\log d}}{\sqrt{n}\epsilon})$,其中 $n$ 为样本量,$d$ 为空间维度。我们提出了一种创新的非交互式本地差分隐私算法,这是该问题领域的首个此类算法。作为显著成果,该算法还衍生出了一种高效的新型估计器作为重要副产品。当数据服从次高斯分布时,我们的算法实现了 $\tilde{O}({\frac{d\sqrt{k}}{\sqrt{n}\epsilon}})$ 的估计误差上界,若服务器拥有额外公开但未标记的数据,该上界可进一步改善 $O(\sqrt{d})$ 因子。对于顺序交互式本地差分隐私模型,我们证明了相似的下界 $\Omega({\frac{\sqrt{dk}}{\sqrt{n}\epsilon}})$。在上界方面,我们修正了先前的分析方法,证明可以达到 $\tilde{O}(\frac{k\sqrt{d}}{\sqrt{n}\epsilon})$ 的界。本研究的发现揭示了稀疏线性回归问题在非私有情形、中心化差分隐私模型与本地差分隐私模型之间的本质差异。