Cloud computing has revolutionized the provisioning of computing resources, offering scalable, flexible, and on-demand services to meet the diverse requirements of modern applications. At the heart of efficient cloud operations are job scheduling and resource management, which are critical for optimizing system performance and ensuring timely and cost-effective service delivery. However, the dynamic and heterogeneous nature of cloud environments presents significant challenges for these tasks, as workloads and resource availability can fluctuate unpredictably. Traditional approaches, including heuristic and meta-heuristic algorithms, often struggle to adapt to these real-time changes due to their reliance on static models or predefined rules. Deep Reinforcement Learning (DRL) has emerged as a promising solution to these challenges by enabling systems to learn and adapt policies based on continuous observations of the environment, facilitating intelligent and responsive decision-making. This survey provides a comprehensive review of DRL-based algorithms for job scheduling and resource management in cloud computing, analyzing their methodologies, performance metrics, and practical applications. We also highlight emerging trends and future research directions, offering valuable insights into leveraging DRL to advance both job scheduling and resource management in cloud computing.
翻译:云计算彻底改变了计算资源的供给方式,为满足现代应用的多样化需求提供了可扩展、灵活且按需的服务。高效云操作的核心在于作业调度与资源管理,这对于优化系统性能、确保及时且经济高效的服务交付至关重要。然而,云环境的动态性与异构性为这些任务带来了重大挑战,因为工作负载和资源可用性可能发生不可预测的波动。包括启发式和元启发式算法在内的传统方法,由于依赖静态模型或预定义规则,往往难以适应这些实时变化。深度强化学习(DRL)通过使系统能够基于对环境的持续观察来学习和调整策略,从而促进智能且响应迅速的决策,已成为应对这些挑战的一种有前景的解决方案。本综述全面回顾了基于DRL的云计算作业调度与资源管理算法,分析了其方法、性能指标和实际应用。我们还强调了新兴趋势和未来的研究方向,为利用DRL推动云计算中作业调度与资源管理的进步提供了有价值的见解。