In this paper, we propose an over-the-air (OTA)-based approach for distributed matrix-vector multiplications in the context of distributed machine learning (DML). Thanks to OTA computation, the column-wise partitioning of a large matrix enables efficient workload distribution among workers (i.e., local computing nodes) based on their computing capabilities. In addition, without requiring additional bandwidth, it allows the system to remain scalable even as the number of workers increases to mitigate the impact of slow workers, known as stragglers. However, despite the improvements, there are still instances where some workers experience deep fading and become stragglers, preventing them from transmitting their results. By analyzing the mean squared error (MSE), we demonstrate that incorporating more workers in the OTA-based approach leads to MSE reduction without the need for additional radio resources. Furthermore, we introduce an analog coding scheme to further enhance the performance and compare it with conventional coded multiplication (CM) schemes. Through simulations, it is shown that the OTA-based approach achieves comparable performance to CM schemes while potentially requiring fewer radio resources.
翻译:本文提出了一种基于空中计算(OTA)的分布式矩阵向量乘法方法,应用于分布式机器学习(DML)场景。借助OTA计算,大型矩阵的按列划分实现了根据计算能力在工作节点(即本地计算节点)间的高效负载分配。此外,该方法无需额外带宽即可保持系统可扩展性,即使工作节点数量增加以缓解慢节点(即掉队节点)的影响亦然。然而,尽管有所改进,仍存在部分节点经历深度衰落而成为掉队节点、无法传输其计算结果的场景。通过分析均方误差(MSE),我们证明在OTA方法中纳入更多工作节点可在无需额外无线电资源的情况下降低MSE。进一步地,我们引入一种模拟编码方案以提升性能,并将其与传统编码乘法(CM)方案进行对比。仿真结果表明,OTA方法在可能消耗更少无线电资源的同时,能够达到与CM方案相当的性能。