In this paper, we present a coded computation (CC) scheme for distributed computation of the inference phase of machine learning (ML) tasks, specifically, the task of image classification. Building upon Agrawal et al.~2022, the proposed scheme combines the strengths of deep learning and Lagrange interpolation technique to mitigate the effect of straggling workers, and recovers approximate results with reasonable accuracy using outputs from any $R$ out of $N$ workers, where $R\leq N$. Our proposed scheme guarantees a minimum recovery threshold $R$ for non-polynomial problems, which can be adjusted as a tunable parameter in the system. Moreover, unlike existing schemes, our scheme maintains flexibility with respect to worker availability and system design. We propose two system designs for our CC scheme that allows flexibility in distributing the computational load between the master and the workers based on the accessibility of input data. Our experimental results demonstrate the superiority of our scheme compared to the state-of-the-art CC schemes for image classification tasks, and pave the path for designing new schemes for distributed computation of any general ML classification tasks.
翻译:本文提出了一种用于机器学习(ML)任务推理阶段分布式计算的编码计算(CC)方案,具体针对图像分类任务。该方案基于Agrawal等人2022年的工作,结合了深度学习与拉格朗日插值技术的优势,以减轻计算节点掉队效应的影响,并利用任意$R$个($R\leq N$)工作节点中的输出结果恢复出具有合理精度的近似结果。所提方案为非多项式问题保证了最小恢复阈值$R$,该阈值可作为系统中的可调参数。此外,与现有方案不同,本方案在节点可用性和系统设计方面保持了灵活性。我们针对该编码计算方案提出了两种系统设计,允许根据输入数据的可访问性在主节点与工作节点之间灵活分配计算负载。实验结果表明,相较于图像分类任务中当前最先进的编码计算方案,本方案具有显著优越性,并为任何通用机器学习分类任务的分布式计算新方案设计铺平了道路。