$\ell_p$-Regression in the Arbitrary Partition Model of Communication

We consider the randomized communication complexity of the distributed $\ell_p$-regression problem in the coordinator model, for $p\in (0,2]$. In this problem, there is a coordinator and $s$ servers. The $i$-th server receives $A^i\in\{-M, -M+1, \ldots, M\}^{n\times d}$ and $b^i\in\{-M, -M+1, \ldots, M\}^n$ and the coordinator would like to find a $(1+\epsilon)$-approximate solution to $\min_{x\in\mathbb{R}^n} \|(\sum_i A^i)x - (\sum_i b^i)\|_p$. Here $M \leq \mathrm{poly}(nd)$ for convenience. This model, where the data is additively shared across servers, is commonly referred to as the arbitrary partition model. We obtain significantly improved bounds for this problem. For $p = 2$, i.e., least squares regression, we give the first optimal bound of $\tilde{\Theta}(sd^2 + sd/\epsilon)$ bits. For $p \in (1,2)$,we obtain an $\tilde{O}(sd^2/\epsilon + sd/\mathrm{poly}(\epsilon))$ upper bound. Notably, for $d$ sufficiently large, our leading order term only depends linearly on $1/\epsilon$ rather than quadratically. We also show communication lower bounds of $\Omega(sd^2 + sd/\epsilon^2)$ for $p\in (0,1]$ and $\Omega(sd^2 + sd/\epsilon)$ for $p\in (1,2]$. Our bounds considerably improve previous bounds due to (Woodruff et al. COLT, 2013) and (Vempala et al., SODA, 2020).

翻译：我们研究了在协调者模型中分布式$\ell_p$-回归问题的随机通信复杂度，其中$p\in (0,2]$。该问题涉及一个协调者和$s$个服务器：第$i$个服务器接收$A^i\in\{-M, -M+1, \ldots, M\}^{n\times d}$和$b^i\in\{-M, -M+1, \ldots, M\}^n$，协调者需要找到$\min_{x\in\mathbb{R}^n} \|(\sum_i A^i)x - (\sum_i b^i)\|_p$的一个$(1+\epsilon)$近似解。为方便起见，这里$M \leq \mathrm{poly}(nd)$。该模型下数据以加性方式分布在各服务器上，通常称为任意划分模型。针对该问题，我们获得了显著改进的界。对于$p=2$（即最小二乘回归），我们首次给出了最优界$\tilde{\Theta}(sd^2 + sd/\epsilon)$比特。当$p \in (1,2)$时，我们得到上界$\tilde{O}(sd^2/\epsilon + sd/\mathrm{poly}(\epsilon))$。值得注意的是，对于足够大的$d$，我们的主导项仅与$1/\epsilon$呈线性依赖而非二次关系。此外，我们证明了通信下界：当$p\in (0,1]$时为$\Omega(sd^2 + sd/\epsilon^2)$，当$p\in (1,2]$时为$\Omega(sd^2 + sd/\epsilon)$。我们的结果显著改进了此前(Woodruff等, COLT 2013)和(Vempala等, SODA 2020)的界。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Query2box: 使用盒嵌入对向量空间中的知识图谱进行推理，Query2box: Reasoning over Knowledge Graphs in Vector Space Using Box Embeddings

专知会员服务

46+阅读 · 2020年5月11日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日