The Function-as-a-service (FaaS) computing model has recently seen significant growth especially for highly scalable, event-driven applications. The easy-to-deploy and cost-efficient fine-grained billing of FaaS is highly attractive to big data applications. However, the stateless nature of serverless platforms poses major challenges when supporting stateful I/O intensive workloads such as a lack of native support for stateful execution, state sharing, and inter-function communication. In this paper, we explore the feasibility of performing stateful big data analytics on serverless platforms and improving I/O throughput of functions by using modern storage technologies such as Intel Optane DC Persistent Memory (PMEM). To this end, we propose Marvel, an end-to-end architecture built on top of the popular serverless platform, Apache OpenWhisk and Apache Hadoop. Marvel makes two main contributions: (1) enable stateful function execution on OpenWhisk by maintaining state information in an in-memory caching layer; and (2) provide access to PMEM backed HDFS storage for faster I/O performance. Our evaluation shows that Marvel reduces the overall execution time of big data applications by up to 86.6% compared to current MapReduce implementations on AWS Lambda.
翻译:函数即服务(FaaS)计算模型近期在高度可扩展的事件驱动型应用领域呈现出显著增长态势。其易于部署、成本低廉的细粒度计费模式对大数据应用极具吸引力。然而,无服务器平台的"无状态"特性在支持有状态的I/O密集型工作负载时面临重大挑战,主要表现为缺乏对有状态执行、状态共享以及函数间通信的原生支持。本文探究了在无服务器平台上执行有状态大数据分析、并通过现代存储技术(如英特尔傲腾数据中心持久化内存PMEM)提升函数I/O吞吐量的可行性。为此,我们提出了Marvel架构——一种基于主流无服务器平台Apache OpenWhisk与Apache Hadoop构建的端到端解决方案。Marvel的核心贡献包括:(1)通过将状态信息维持在内存缓存层,在OpenWhisk上实现有状态函数执行;(2)提供基于PMEM的HDFS存储访问,实现更快的I/O性能。实验评估表明,与当前AWS Lambda上的MapReduce实现相比,Marvel将大数据应用的总体执行时间最高降低86.6%。