In multiple domains, statistical tasks are performed in distributed settings, with data split among several end machines that are connected to a fusion center. In various applications, the end machines have limited bandwidth and power, and thus a tight communication budget. In this work we focus on distributed learning of a sparse linear regression model, under severe communication constraints. We propose several two round distributed schemes, whose communication per machine is sublinear in the data dimension. In our schemes, individual machines compute debiased lasso estimators, but send to the fusion center only very few values. On the theoretical front, we analyze one of these schemes and prove that with high probability it achieves exact support recovery at low signal to noise ratios, where individual machines fail to recover the support. We show in simulations that our scheme works as well as, and in some cases better, than more communication intensive approaches.
翻译:在多个领域中,统计任务通常在分布式环境中执行,数据被分割存储于多个终端机器,这些机器连接至一个融合中心。在各种应用中,终端机器的带宽和功率有限,因此通信预算十分紧张。本研究聚焦于在严格通信约束下分布式学习稀疏线性回归模型。我们提出了几种两轮分布式方案,其每台机器的通信量相对于数据维度是次线性的。在我们的方案中,各终端机器计算去偏的lasso估计量,但仅向融合中心发送极少量的数值。在理论方面,我们分析了其中一种方案,并证明在高概率下,该方案能够在低信噪比条件下实现精确的支持恢复,而单个机器在此条件下无法恢复支持集。仿真实验表明,我们的方案与通信需求更高的方法效果相当,在某些情况下甚至表现更优。