The Function-as-a-service (FaaS) computing model has recently seen significant growth especially for highly scalable, event-driven applications. The easy-to-deploy and cost-efficient fine-grained billing of FaaS is highly attractive to big data applications. However, the stateless nature of serverless platforms poses major challenges when supporting stateful I/O intensive workloads such as a lack of native support for stateful execution, state sharing, and inter-function communication. In this paper, we explore the feasibility of performing stateful big data analytics on serverless platforms and improving I/O throughput of functions by using modern storage technologies such as Intel Optane DC Persistent Memory (PMEM). To this end, we propose Marvel, an end-to-end architecture built on top of the popular serverless platform, Apache OpenWhisk and Apache Hadoop. Marvel makes two main contributions: (1) enable stateful function execution on OpenWhisk by maintaining state information in an in-memory caching layer; and (2) provide access to PMEM backed HDFS storage for faster I/O performance. Our evaluation shows that Marvel reduces the overall execution time of big data applications by up to 86.6% compared to current MapReduce implementations on AWS Lambda.
翻译:函数即服务(FaaS)计算模式近年来取得了显著增长,尤其适用于高度可扩展的事件驱动型应用。FaaS易于部署且具有成本效益的细粒度计费方式对大数据应用极具吸引力。然而,无服务器平台的固有"无状态"特性在支持有状态I/O密集型工作负载时面临重大挑战,例如缺乏对有状态执行、状态共享及函数间通信的原生支持。本文探究了在无服务器平台上执行有状态大数据分析的可行性,并通过采用英特尔傲腾DC持久内存(PMEM)等现代存储技术来提升函数的I/O吞吐量。为此,我们提出了Marvel——一种基于流行无服务器平台Apache OpenWhisk与Apache Hadoop构建的端到端架构。Marvel包含两项主要贡献:(1)通过将状态信息维护在内存缓存层中,实现在OpenWhisk上的有状态函数执行;(2)提供对PMEM支持的HDFS存储的访问以实现更快的I/O性能。评估表明,与AWS Lambda上现有的MapReduce实现相比,Marvel可将大数据应用的整体执行时间降低多达86.6%。