The data movement in large-scale computing facilities (from compute nodes to data nodes) is categorized as one of the major contributors to high cost and energy utilization. To tackle it, in-storage processing (ISP) within storage devices, such as Solid-State Drives (SSDs), has been explored actively. The introduction of computational storage drives (CSDs) enabled ISP within the same form factor as regular SSDs and made it easy to replace SSDs within traditional compute nodes. With CSDs, host systems can offload various operations such as search, filter, and count. However, commercialized CSDs have different hardware resources and performance characteristics. Thus, it requires careful consideration of hardware, performance, and workload characteristics for building a CSD-based storage system within a compute node. Therefore, storage architects are hesitant to build a storage system based on CSDs as there are no tools to determine the benefits of CSD-based compute nodes to meet the performance requirements compared to traditional nodes based on SSDs. In this work, we proposed an analytical model-based storage capacity planner called CSDPlan for system architects to build performance-effective CSD-based compute nodes. Our model takes into account the performance characteristics of the host system, targeted workloads, and hardware and performance characteristics of CSDs to be deployed and provides optimal configuration based on the number of CSDs for a compute node. Furthermore, CSDPlan estimates and reduces the total cost of ownership (TCO) for building a CSD-based compute node. To evaluate the efficacy of CSDPlan, we selected two commercially available CSDs and 4 representative big data analysis workloads.
翻译:大规模计算设施(从计算节点到数据节点)的数据迁移是导致高成本和高能耗的主要因素之一。为解决这一问题,存储设备(如固态硬盘)内的存储内处理(ISP)技术被广泛探索。计算型存储驱动器(CSD)的引入使得ISP能够在与常规SSD相同的物理外形下实现,并易于替换传统计算节点中的SSD。借助CSD,主机系统可以卸载搜索、过滤、计数等各种操作。然而,商用CSD具有不同的硬件资源和性能特征,因此在计算节点内构建基于CSD的存储系统需要仔细考虑硬件、性能和工作负载特性。由于缺乏能够确定基于CSD的计算节点相比传统SSD节点满足性能要求的收益的工具,存储架构师对于构建基于CSD的存储系统持犹豫态度。本文提出了一种基于分析模型的存储容量规划器CSDPlan,帮助系统架构师构建高效能的基于CSD的计算节点。我们的模型综合考虑了主机系统的性能特征、目标工作负载以及待部署CSD的硬件和性能特征,并基于计算节点所需的CSD数量提供最优配置方案。此外,CSDPlan能够估算并降低构建基于CSD的计算节点的总拥有成本(TCO)。为评估CSDPlan的有效性,我们选取了两款商用CSD和四个代表性的大数据分析工作负载进行验证。