Cloud providers have introduced pricing models to incentivize long-term commitments of compute capacity. These long-term commitments allow the cloud providers to get guaranteed revenue for their investments in data centers and computing infrastructure. However, these commitments expose cloud customers to demand risk if expected future demand does not materialize. While there are existing studies of theoretical techniques for optimizing performance, latency, and cost, relatively little has been reported so far on the trade-offs between cost savings and demand risk for compute commitments for large-scale cloud services. We characterize cloud compute demand based on an extensive three year study of the Snowflake Data Cloud, which includes data warehousing, data lakes, data science, data engineering, and other workloads across multiple clouds. We quantify capacity demand drivers from user workloads, hardware generational improvements, and software performance improvements. Using this data, we formulate a series of practical optimizations that maximize capacity availability and minimize costs for the cloud customer.
翻译:云服务提供商已引入定价模型以激励用户对计算容量做出长期承诺。这些长期承诺使云服务提供商能够为其在数据中心和计算基础设施上的投资获得有保障的收入。然而,如果预期的未来需求未能实现,这些承诺将使云客户面临需求风险。尽管已有关于优化性能、延迟和成本的理论技术研究,但迄今为止,关于大规模云服务计算承诺的成本节约与需求风险之间的权衡,相关报道相对较少。基于对Snowflake数据云长达三年的广泛研究,我们刻画了云计算需求的特征,该研究涵盖跨多个云的数据仓库、数据湖、数据科学、数据工程及其他工作负载。我们量化了来自用户工作负载、硬件代际改进和软件性能改进的容量需求驱动因素。利用这些数据,我们构建了一系列实用优化方案,旨在为云客户最大化容量可用性并最小化成本。