As the capacity of Solid-State Drives (SSDs) is constantly being optimised and boosted with gradually reduced cost, the SSD cluster is now widely deployed as part of the hybrid storage system in various scenarios such as cloud computing and big data processing. However, despite its rapid developments, the performance of the SSD cluster remains largely under-investigated, leaving its sub-optimal applications in reality. To address this issue, in this paper we conduct extensive empirical studies for a comprehensive understanding of the SSD cluster in diverse settings. To this end, we configure a real SSD cluster and gather the generated trace data based on some often-used benchmarks, then adopt analytical methods to analyse the performance of the SSD cluster with different configurations. In particular, regression models are built to provide better performance predictability under broader configurations, and the correlations between influential factors and performance metrics with respect to different numbers of nodes are investigated, which reveal the high scalability of the SSD cluster. Additionally, the cluster's network bandwidth is inspected to explain the performance bottleneck. Finally, the knowledge gained is summarised to benefit the SSD cluster deployment in practice.
翻译:随着固态硬盘(SSD)容量持续优化提升且成本逐渐降低,SSD集群已广泛部署为混合存储系统的一部分,应用于云计算和大数据处理等多种场景。然而,尽管其发展迅速,SSD集群的性能仍缺乏深入研究,导致其在实际应用中存在次优部署的问题。针对这一现状,本文通过广泛的实证研究,全面理解不同配置下SSD集群的性能表现。为此,我们构建了一个真实的SSD集群,基于常用基准测试收集生成的追踪数据,并采用分析方法评估不同配置下SSD集群的性能。具体而言,我们构建回归模型以在更广泛的配置下提供更优的性能可预测性,同时探究不同节点数量下影响因素与性能指标之间的相关性,揭示了SSD集群的高可扩展性。此外,我们检查了集群的网络带宽以解释性能瓶颈。最后,总结所获得的经验知识以指导SSD集群的实际部署。