With the increasing demand for high-performance and high-efficiency computing, cloud computing, especially serverless computing, has gradually become a research hotspot in recent years, attracting numerous research attention. Meanwhile, MapReduce, which is a popular big data processing model in the industry, has been widely applied in various fields. Inspired by the serverless framework of Function as a Service and the high concurrency and robustness of MapReduce programming model, this paper focus on combining them to reduce the time span and increase the efficiency when executing the word frequency counting task. In this case, the paper use a MapReduce programming model based on a serverless computing platform to figure out the most optimized number of Map functions and Reduce functions for a particular task. For the same amount of workload, extensive experiments show both execution time reduces and the overall efficiency of the program improves at different rates as the number of map functions and reduce functions increases. This paper suppose the discovery of the most optimized number of map and reduce functions can help cooperations and programmers figure out the most optimized solutions.
翻译:随着高性能与高效率计算需求的日益增长,云计算,特别是无服务器计算,近年来逐渐成为研究热点,吸引了大量研究关注。与此同时,MapReduce作为一种行业内流行的大数据处理模型,已在众多领域得到广泛应用。受函数即服务的无服务器框架以及MapReduce编程模型的高并发性和鲁棒性启发,本文聚焦于将二者结合,以缩短词频统计任务的执行时间跨度并提升效率。为此,本文采用基于无服务器计算平台的MapReduce编程模型,旨在为特定任务找出最优的Map函数与Reduce函数数量。针对相同的工作负载,大量实验表明,随着Map函数和Reduce函数数量的增加,执行时间均有所减少,程序整体效率亦以不同速率提升。本文认为,最优Map与Reduce函数数量的发现有助于企业与程序员找到最优解决方案。