The increasing demand for computational power in big data and machine learning has driven the development of distributed training methodologies. Among these, peer-to-peer (P2P) networks provide advantages such as enhanced scalability and fault tolerance. However, they also encounter challenges related to resource consumption, costs, and communication overhead as the number of participating peers grows. In this paper, we introduce a novel architecture that combines serverless computing with P2P networks for distributed training and present a method for efficient parallel gradient computation under resource constraints. Our findings show a significant enhancement in gradient computation time, with up to a 97.34\% improvement compared to conventional P2P distributed training methods. As for costs, our examination confirmed that the serverless architecture could incur higher expenses, reaching up to 5.4 times more than instance-based architectures. It is essential to consider that these higher costs are associated with marked improvements in computation time, particularly under resource-constrained scenarios. Despite the cost-time trade-off, the serverless approach still holds promise due to its pay-as-you-go model. Utilizing dynamic resource allocation, it enables faster training times and optimized resource utilization, making it a promising candidate for a wide range of machine learning applications.
翻译:大数据和机器学习领域对计算能力的需求日益增长,推动了分布式训练方法的发展。其中,点对点(P2P)网络凭借其增强的可扩展性和容错性等优势脱颖而出。然而,随着参与节点数量的增加,这些网络也面临资源消耗、成本及通信开销等方面的挑战。本文提出一种将无服务器计算与P2P网络相结合的新型架构,并介绍一种在资源受限条件下实现高效并行梯度计算的方法。实验结果表明,与传统P2P分布式训练方法相比,梯度计算时间显著提升,最高可达97.34%。在成本方面,我们的分析证实无服务器架构可能产生更高的费用,最高可达基于实例架构的5.4倍。需要强调的是,这些额外成本伴随着计算时间的显著改善,尤其在资源受限场景下。尽管存在成本-时间权衡,无服务器方法因其按需付费模式仍具有发展潜力。通过动态资源分配,该方法能够实现更快的训练速度和优化的资源利用率,使其成为众多机器学习应用领域的有力候选方案。