Dairy farming is an energy intensive sector that relies heavily on grid electricity. With increasing renewable energy integration, sustainable energy management has become essential for reducing grid dependence and supporting the United Nations Sustainable Development Goal 7 on affordable and clean energy. However, the intermittent nature of renewables poses challenges in balancing supply and demand in real time. Intelligent load scheduling is therefore crucial to minimize operational costs while maintaining reliability. Reinforcement Learning has shown promise in improving energy efficiency and reducing costs. However, most RL-based scheduling methods assume complete knowledge of future prices or generation, which is unrealistic in dynamic environments. Moreover, standard PPO variants rely on fixed clipping or KL divergence thresholds, often leading to unstable training under variable tariffs. To address these challenges, this study proposes a Deep Reinforcement Learning framework for efficient load scheduling in dairy farms, focusing on battery storage and water heating under realistic operational constraints. The proposed Forecast Aware PPO incorporates short term forecasts of demand and renewable generation using hour of day and month based residual calibration, while the PID KL PPO variant employs a proportional integral derivative controller to regulate KL divergence for stable policy updates adaptively. Trained on real world dairy farm data, the method achieves up to 1% lower electricity cost than PPO, 4.8% than DQN, and 1.5% than SAC. For battery scheduling, PPO reduces grid imports by 13.1%, demonstrating scalability and effectiveness for sustainable energy management in modern dairy farming.
翻译:奶牛养殖是能源密集型行业,严重依赖电网供电。随着可再生能源并网比例的增加,可持续能源管理对于减少电网依赖、支持联合国可持续发展目标7(经济适用的清洁能源)至关重要。然而,可再生能源的间歇性给实时供需平衡带来了挑战。因此,智能负荷调度对于在保持可靠性的同时最小化运营成本至关重要。强化学习在提高能源效率和降低成本方面已显示出潜力。然而,大多数基于强化学习的调度方法假设完全知晓未来电价或发电量,这在动态环境中是不现实的。此外,标准PPO变体依赖于固定的裁剪或KL散度阈值,在可变电价下常导致训练不稳定。为应对这些挑战,本研究提出了一种用于奶牛场高效负荷调度的深度强化学习框架,重点关注现实运行约束下的电池储能和热水系统。所提出的预测感知PPO通过基于小时和月份的残差校准,纳入短期需求和可再生能源发电预测;而PID KL PPO变体则采用比例积分微分控制器自适应调节KL散度,以实现稳定的策略更新。基于真实奶牛场数据训练,该方法相比PPO降低用电成本达1%,相比DQN降低4.8%,相比SAC降低1.5%。在电池调度方面,PPO减少了13.1%的电网购电量,证明了该方法在现代奶牛养殖可持续能源管理中的可扩展性和有效性。