We generalize the leverage score sampling sketch for $\ell_2$-subspace embeddings, to accommodate sampling subsets of the transformed data, so that the sketching approach is appropriate for distributed settings. This is then used to derive an approximate coded computing approach for first-order methods; known as gradient coding, to accelerate linear regression in the presence of failures in distributed computational networks, \textit{i.e.} stragglers. We replicate the data across the distributed network, to attain the approximation guarantees through the induced sampling distribution. The significance and main contribution of this work, is that it unifies randomized numerical linear algebra with approximate coded computing, while attaining an induced $\ell_2$-subspace embedding through uniform sampling. The transition to uniform sampling is done without applying a random projection, as in the case of the subsampled randomized Hadamard transform. Furthermore, by incorporating this technique to coded computing, our scheme is an iterative sketching approach to approximately solving linear regression. We also propose weighting when sketching takes place through sampling with replacement, for further compression.
翻译:本文推广了用于$\ell_2$-子空间嵌入的杠杆得分采样草图方法,以支持对变换后数据的子集进行采样,从而使该草图方法适用于分布式环境。随后,该方法被用于推导一阶方法的近似编码计算方案,即梯度编码,以加速分布式计算网络中节点故障(即掉队者)情况下的线性回归。我们将数据复制到分布式网络中,通过所诱导的采样分布来保证近似精度。本研究的重要意义和主要贡献在于,它将随机数值线性代数与近似编码计算统一起来,同时通过均匀采样实现了诱导的$\ell_2$-子空间嵌入。这种到均匀采样的过渡无需应用随机投影——例如子采样随机哈达玛变换中的做法。此外,通过将该技术融入编码计算,我们的方案成为一种迭代草图方法,用于近似求解线性回归。我们还提出了在通过有放回采样进行草图化时的加权策略,以实现进一步压缩。