Cloud data centers face increasing pressure to reduce operational energy consumption as big data workloads continue to grow in scale and complexity. This paper presents a workload aware and energy efficient scheduling framework that profiles CPU utilization, memory demand, and storage IO behavior to guide virtual machine placement decisions. By combining historical execution logs with real time telemetry, the proposed system predicts the energy and performance impact of candidate placements and enables adaptive consolidation while preserving service level agreement compliance. The framework is evaluated using representative Hadoop MapReduce, Spark MLlib, and ETL workloads deployed on a multi node cloud testbed. Experimental results demonstrate consistent energy savings of 15 to 20 percent compared to a baseline scheduler, with negligible performance degradation. These findings highlight workload profiling as a practical and scalable strategy for improving the sustainability of cloud based big data processing environments.
翻译:随着大数据工作负载的规模和复杂性持续增长,云数据中心面临着降低运营能耗的日益增长的压力。本文提出了一种负载感知且高能效的调度框架,该框架通过剖析CPU利用率、内存需求和存储IO行为来指导虚拟机放置决策。通过将历史执行日志与实时遥测数据相结合,所提出的系统能够预测候选放置方案对能耗和性能的影响,并在保证服务等级协议合规性的同时实现自适应整合。该框架通过部署在多节点云测试平台上的代表性Hadoop MapReduce、Spark MLlib和ETL工作负载进行评估。实验结果表明,与基线调度器相比,该系统能实现15%至20%的稳定节能,且性能下降可忽略不计。这些发现凸显了负载剖析作为一种实用且可扩展的策略,对于提升基于云的大数据处理环境的可持续性具有重要意义。