In recent years, data-intensive applications have been increasingly deployed on cloud systems. Such applications utilize significant compute, memory, and I/O resources to process large volumes of data. Optimizing the performance and cost-efficiency for such applications is a non-trivial problem. The problem becomes even more challenging with the increasing use of containers, which are popular due to their lower operational overheads and faster boot speed at the cost of weaker resource assurances for the hosted applications. In this paper, two containerized data-intensive applications with very different performance objectives and resource needs were studied on cloud servers with Docker containers running on Intel Xeon E5 and AMD EPYC Rome multi-core processors with a range of CPU, memory, and I/O configurations. Primary findings from our experiments include: 1) Allocating multiple cores to a compute-intensive application can improve performance, but only if the cores do not contend for the same caches, and the optimal core counts depend on the specific workload; 2) allocating more memory to a memory-intensive application than its deterministic data workload does not further improve performance; however, 3) having multiple such memory-intensive containers on the same server can lead to cache and memory bus contention leading to significant and volatile performance degradation. The comparative observations on Intel and AMD servers provided insights into trade-offs between larger numbers of distributed chiplets interconnected with higher speed buses (AMD) and larger numbers of centrally integrated cores and caches with lesser speed buses (Intel). For the two types of applications studied, the more distributed caches and faster data buses have benefited the deployment of larger numbers of containers.
翻译:近年来,数据密集型应用日益部署在云系统上。此类应用需消耗大量计算、内存及I/O资源以处理海量数据。优化其性能与成本效率是一个复杂问题。随着容器(因其运营开销更低、启动速度更快但托管应用的资源保证较弱)的广泛使用,该问题更趋严峻。本文以搭载Docker容器、分别运行于Intel Xeon E5与AMD EPYC Rome多核处理器的云服务器为平台,研究了两类性能目标与资源需求截然不同的容器化数据密集型应用,并涵盖多种CPU、内存及I/O配置。实验核心发现包括:1)为计算密集型应用分配多核可提升性能,前提是这些核心不竞争同一缓存,且最优核心数取决于具体工作负载;2)为内存密集型应用分配超过其确定性数据工作负载所需的内存容量并不能进一步改善性能;3)但在同一服务器上部署多个此类内存密集型容器会引发缓存与内存总线竞争,导致显著且不稳定的性能下降。基于Intel与AMD服务器的对比观测揭示了不同架构间的权衡:AMD采用大量分布式小芯片通过高速总线互连,而Intel则集成更多中央核心与缓存并通过低速总线连接。针对所研究的两种应用类型,更分布式的缓存与更快速的数据总线更有利于大规模容器部署。