Distributed linearly separable computation is a fundamental problem in large-scale distributed systems, requiring the computation of linearly separable functions over different datasets across distributed workers. This paper studies a heterogeneous distributed linearly separable computation problem, including one master and N distributed workers. The linearly separable task function involves Kc linear combinations of K messages, where each message is a function of one dataset. Distinguished from the existing homogeneous settings that assume each worker holds the same number of datasets, where the data assignment is carefully designed and controlled by the data center (e.g., the cyclic assignment), we consider a more general setting with arbitrary heterogeneous data assignment across workers, where `arbitrary' means that the data assignment is given in advance and `heterogeneous' means that the workers may hold different numbers of datasets. Our objective is to characterize the fundamental tradeoff between the computable dimension of the task function and the communication cost under arbitrary heterogeneous data assignment. Under the constraint of integer communication costs, for arbitrary heterogeneous data assignment, we propose a universal computing scheme and a universal converse bound by characterizing the structure of data assignment, where they coincide under some parameter regimes. We then extend the proposed computing scheme and converse bound to the case of fractional communication costs.
翻译:分布式线性可分计算是大规模分布式系统中的基本问题,它要求在不同分布式工作节点上对异构数据集计算线性可分函数。本文研究一种异构分布式线性可分计算问题,该系统包含一个主节点和 N 个分布式工作节点。线性可分任务函数涉及 K 个消息的 Kc 个线性组合,其中每个消息是单个数据集的函数。与现有假设每个工作节点持有相同数量数据集的同构设置(其数据分配由数据中心精心设计与控制,例如循环分配)不同,我们考虑一种更一般的设置:工作节点间存在任意异构的数据分配。其中“任意”指数据分配已预先给定,“异构”指各工作节点可能持有不同数量的数据集。我们的目标是刻画任意异构数据分配下任务函数的可计算维度与通信成本之间的基本权衡关系。在整数通信成本约束下,针对任意异构数据分配,我们通过刻画数据分配的结构提出了一种通用计算方案和一个通用逆界,二者在某些参数范围内完全吻合。随后,我们将所提出的计算方案与逆界推广至分数通信成本的情形。