We consider gradient coding in the presence of an adversary, controlling so-called malicious workers trying to corrupt the computations. Previous works propose the use of MDS codes to treat the inputs of the malicious workers as errors and correct them using the error-correction properties of the code. This comes at the expense of increasing the replication, i.e., the number of workers each partial gradient is computed by. In this work, we reduce replication by proposing a method that detects the erroneous inputs from the malicious workers, hence transforming them into erasures. For $s$ malicious workers, our solution can reduce the replication to $s+1$ instead of $2s+1$ for each partial gradient at the expense of only $s$ additional computations at the main node and additional rounds of light communication between the main node and the workers. We give fundamental limits of the general framework for fractional repetition data allocation. Our scheme is optimal in terms of replication and local computation but incurs a communication cost that is asymptotically, in the size of the dataset, a multiplicative factor away from the derived bound.
翻译:我们考虑在存在敌手(控制所谓的恶意工作者试图破坏计算)的情况下的梯度编码。先前的工作提出使用MDS码将恶意工作者的输入视为错误,并利用编码的纠错特性进行纠正。这以增加复制(即每个部分梯度被计算的工作者数量)为代价。在本工作中,我们提出一种方法检测恶意工作者的错误输入,将其转化为擦除,从而降低复制。对于$s$个恶意工作者,我们的解决方案可将每个部分梯度的复制从$2s+1$降低至$s+1$,仅需在主节点额外进行$s$次计算以及在主节点与工作者之间进行额外若干轮的轻量通信。我们给出了分数重复数据分配通用框架的基本极限。我们的方案在复制和本地计算方面达到最优,但在数据集规模下,其通信代价与推导出的界限相差一个乘法因子(渐近意义)。