This paper is devoted to studying the semi-supervised sparse statistical inference in a distributed setup. An efficient multi-round distributed debiased estimator, which integrates both labeled and unlabelled data, is developed. We will show that the additional unlabeled data helps to improve the statistical rate of each round of iteration. Our approach offers tailored debiasing methods for $M$-estimation and generalized linear model according to the specific form of the loss function. Our method also applies to a non-smooth loss like absolute deviation loss. Furthermore, our algorithm is computationally efficient since it requires only one estimation of a high-dimensional inverse covariance matrix. We demonstrate the effectiveness of our method by presenting simulation studies and real data applications that highlight the benefits of incorporating unlabeled data.
翻译:本文致力于研究分布式设置下的半监督稀疏统计推断问题。我们提出了一种高效的多轮分布式去偏估计方法,该方法整合了标记数据与未标记数据。研究表明,额外未标记数据有助于提升每一轮迭代的统计效率。我们的方法根据损失函数的具体形式,为$M$估计和广义线性模型提供了定制化的去偏方案,且适用于非光滑损失(如绝对偏差损失)。此外,该算法在计算上高效,仅需对高维逆协方差矩阵进行一次估计。我们通过仿真研究和实际数据应用验证了所提方法的有效性,凸显了整合未标记数据的优势。