Cloud computing has revolutionized the way organizations manage their IT infrastructure, but it has also introduced new challenges, such as managing cloud costs. The rapid adoption of artificial intelligence (AI) and machine learning (ML) workloads has further amplified these challenges, with GPU compute now representing 40-60\% of technical budgets for AI-focused organizations. This paper provides a comprehensive review of cloud and AI infrastructure cost optimization techniques, covering traditional cloud pricing models, resource allocation strategies, and emerging approaches for managing AI/ML workloads. We examine the dramatic cost reductions in large language model (LLM) inference which has decreased by approximately 10x annually since 2021 and explore techniques such as model quantization, GPU instance selection, and inference optimization. Real-world case studies from Amazon Prime Video, Pinterest, Cloudflare, and Netflix showcase practical application of these techniques. Our analysis reveals that organizations can achieve 50-90% cost savings through strategic optimization approaches. Future research directions in automated optimization, sustainability, and AI-specific cost management are proposed to advance the state of the art in this rapidly evolving field.
翻译:云计算彻底改变了组织管理信息技术基础设施的方式,但也带来了新的挑战,例如管理云成本。人工智能和机器学习工作负载的迅速采用进一步加剧了这些挑战,对于专注于AI的组织而言,GPU计算现已占其技术预算的40-60%。本文全面综述了云与AI基础设施成本优化技术,涵盖传统云定价模型、资源分配策略以及管理AI/ML工作负载的新兴方法。我们考察了大型语言模型推理成本的显著降低(自2021年以来每年约下降10倍),并探讨了模型量化、GPU实例选择及推理优化等技术。来自Amazon Prime Video、Pinterest、Cloudflare和Netflix的真实案例研究展示了这些技术的实际应用。我们的分析表明,通过战略性优化方法,组织可实现50-90%的成本节约。本文提出了自动化优化、可持续性以及AI特定成本管理等未来研究方向,以推动这一快速发展领域的技术前沿。