Cloud and AI Infrastructure Cost Optimization: A Comprehensive Review of Strategies and Case Studies

from arxiv, Version 2. Significantly expanded to include AI/ML infrastructure and GPU cost optimization. Updated with 2025 industry data and new case studies on LLM inference costs. Title updated from "Cloud Cost Optimization: A Comprehensive Review of Strategies and Case Studies" to reflect broader scope

Cloud computing has revolutionized the way organizations manage their IT infrastructure, but it has also introduced new challenges, such as managing cloud costs. The rapid adoption of artificial intelligence (AI) and machine learning (ML) workloads has further amplified these challenges, with GPU compute now representing 40-60\% of technical budgets for AI-focused organizations. This paper provides a comprehensive review of cloud and AI infrastructure cost optimization techniques, covering traditional cloud pricing models, resource allocation strategies, and emerging approaches for managing AI/ML workloads. We examine the dramatic cost reductions in large language model (LLM) inference which has decreased by approximately 10x annually since 2021 and explore techniques such as model quantization, GPU instance selection, and inference optimization. Real-world case studies from Amazon Prime Video, Pinterest, Cloudflare, and Netflix showcase practical application of these techniques. Our analysis reveals that organizations can achieve 50-90% cost savings through strategic optimization approaches. Future research directions in automated optimization, sustainability, and AI-specific cost management are proposed to advance the state of the art in this rapidly evolving field.

翻译：云计算彻底改变了组织管理信息技术基础设施的方式，但也带来了新的挑战，例如管理云成本。人工智能和机器学习工作负载的迅速采用进一步加剧了这些挑战，对于专注于AI的组织而言，GPU计算现已占其技术预算的40-60%。本文全面综述了云与AI基础设施成本优化技术，涵盖传统云定价模型、资源分配策略以及管理AI/ML工作负载的新兴方法。我们考察了大型语言模型推理成本的显著降低（自2021年以来每年约下降10倍），并探讨了模型量化、GPU实例选择及推理优化等技术。来自Amazon Prime Video、Pinterest、Cloudflare和Netflix的真实案例研究展示了这些技术的实际应用。我们的分析表明，通过战略性优化方法，组织可实现50-90%的成本节约。本文提出了自动化优化、可持续性以及AI特定成本管理等未来研究方向，以推动这一快速发展领域的技术前沿。

相关内容

关注 7103

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

《人工智能治理实施的挑战与应对策略：系统性文献综述》最新97页

专知会员服务

24+阅读 · 2025年7月24日

《人工智能暗战：SaaS与边缘计算架构之争》

专知会员服务

14+阅读 · 2025年7月23日

云智算技术白皮书（2025）

专知会员服务

24+阅读 · 2025年5月29日

【新书】优化算法:用于设计、规划和控制问题的人工智能技术，669页pdf

专知会员服务

96+阅读 · 2024年8月28日