Coded polynomial aggregation (CPA) in distributed computing systems enables the master to directly recover a weighted aggregation of polynomial computations without individually decoding each term, thereby reducing the number of required worker responses. However, existing CPA schemes are restricted to an idealized setting in which the system cannot tolerate stragglers. In this paper, we extend CPA to straggler-aware distributed computing systems with a pre-specified non-straggler pattern, where exact recovery is required for a given collection of admissible non-straggler sets. Our main results show that exact recovery of the desired aggregation is achievable with fewer worker responses than that required by polynomial codes based on individual decoding, and that feasibility is characterized by the intersection structure of the non-straggler patterns. In particular, we establish necessary and sufficient conditions for exact recovery in straggler-aware CPA. We identify an intersection-size threshold that is sufficient to guarantee exact recovery. When the number of admissible non-straggler sets is sufficiently large, we further show that this threshold is necessary in a generic sense. We also provide an explicit construction of feasible CPA schemes whenever the intersection size exceeds the derived threshold. Finally, simulations verify our theoretical results by demonstrating a sharp feasibility transition at the predicted intersection threshold.
翻译:分布式计算系统中的编码多项式聚合技术使主节点能够直接恢复多项式计算的加权聚合结果,而无需单独解码每项计算,从而减少所需的工作节点响应数量。然而,现有的编码多项式聚合方案仅限于无法容忍延迟节点的理想化场景。本文将该技术扩展至具有预设非延迟节点模式的延迟感知分布式计算系统,要求在给定的可容许非延迟节点集合中实现精确恢复。我们的主要研究结果表明:相较于基于单独解码的多项式编码方案,实现目标聚合的精确恢复所需的工作节点响应更少,且可行性由非延迟节点模式的交集结构决定。我们特别建立了延迟感知编码多项式聚合中实现精确恢复的充分必要条件,并确定了一个能够保证精确恢复的交集规模阈值。当可容许非延迟节点集合数量足够大时,我们进一步证明该阈值在一般意义下是必要的。我们还给出了当交集规模超过推导阈值时可行编码多项式聚合方案的显式构造方法。最后,仿真实验通过展示在理论预测的交集阈值处出现的急剧可行性转变,验证了我们的理论结果。