Linear real-valued computations over distributed datasets are common in many applications, most notably as part of machine learning inference. In particular, linear computations which are quantized, i.e., where the coefficients are restricted to a predetermined set of values (such as $\pm 1$), gained increasing interest lately due to their role in efficient, robust, or private machine learning models. Given a dataset to store in a distributed system, we wish to encode it so that all such computations could be conducted by accessing a small number of servers, called the access parameter of the system. Doing so relieves the remaining servers to execute other tasks, and reduces the overall communication in the system. Minimizing the access parameter gives rise to an access-redundancy tradeoff, where smaller access parameter requires more redundancy in the system, and vice versa. In this paper we study this tradeoff, and provide several explicit code constructions based on covering codes in a novel way. While the connection to covering codes has been observed in the past, our results strictly outperform the state-of-the-art, and extend the framework to new families of computations.
翻译:在分布式数据集上的线性实值计算在许多应用中很常见,尤其是在机器学习推理中尤为突出。特别是,量化线性计算(即系数被限制在预定义值集合(如 $\pm 1$)中的计算)近年来因其在高效、鲁棒或隐私保护的机器学习模型中的作用而日益受到关注。给定一个要存储在分布式系统中的数据集,我们希望对其进行编码,以便能够通过访问少量服务器(称为系统的访问参数)来执行所有此类计算。这样做可以使其余服务器释放出来执行其他任务,并减少系统中的整体通信开销。最小化访问参数会导致访问-冗余权衡,即较小的访问参数需要在系统中引入更多的冗余,反之亦然。在本文中,我们研究了这种权衡,并以一种新颖的方式基于覆盖码提出了几种显式编码构造。尽管过去已经观察到与覆盖码的关联,但我们的结果严格优于现有技术,并将框架扩展到了新的计算类别。