Deadline-Aware, Energy-Efficient Control of Domestic Immersion Hot Water Heater

Typical domestic immersion water heater systems are often operated continuously during winter, heating quickly rather than efficiently and ignoring predictable demand windows and ambient losses. We study deadline-aware control, where the aim is to reach a target temperature at a specified time while minimising energy consumption. We introduce an efficient Gymnasium environment that models an immersion hot water heater with first-order thermal losses and discrete on and off actions of 0 W and 6000 W applied every 120 seconds. Methods include a time-optimal bang-bang baseline, a zero-shot Monte Carlo Tree Search planner, and a Proximal Policy Optimisation policy. We report total energy consumption in watt-hours under identical physical dynamics. Across sweeps of initial temperature from 10 to 30 degrees Celsius, deadline from 30 to 90 steps, and target temperature from 40 to 80 degrees Celsius, PPO achieves the most energy-efficient performance at a 60-step horizon of 2 hours, using 3.23 kilowatt-hours, compared to 4.37 to 10.45 kilowatt-hours for bang-bang control and 4.18 to 6.46 kilowatt-hours for MCTS. This corresponds to energy savings of 26 percent at 30 steps and 69 percent at 90 steps. In a representative trajectory with a 50 kg water mass, 20 degrees Celsius ambient temperature, and a 60 degrees Celsius target, PPO consumes 54 percent less energy than bang-bang control and 33 percent less than MCTS. These results show that learned deadline-aware control reduces energy consumption under identical physical assumptions, while planners provide partial savings without training and learned policies offer near-zero inference cost once trained.

翻译：典型的家用浸入式热水器系统在冬季通常连续运行，追求快速加热而非高效加热，忽略了可预测的需求时段和环境热损失。我们研究面向截止时间的控制方法，其目标是在指定时间达到目标温度的同时，最小化能耗。我们引入了一个高效的Gymnasium环境，该环境模拟了一个具有一阶热损失的浸入式热水器，其离散开关动作（0 W和6000 W）每120秒施加一次。研究方法包括时间最优的Bang-Bang基线控制、零样本蒙特卡洛树搜索规划器以及近端策略优化策略。我们在相同的物理动力学条件下报告以瓦时为单位的总能耗。在对初始温度（10至30摄氏度）、截止时间步长（30至90步）和目标温度（40至80摄氏度）进行参数扫描后，PPO在60步（对应2小时）的时间范围内实现了最高的能效，能耗为3.23千瓦时，而Bang-Bang控制的能耗为4.37至10.45千瓦时，MCTS的能耗为4.18至6.46千瓦时。这相当于在30步时节能26%，在90步时节能69%。在一个具有代表性工况（水温50千克，环境温度20摄氏度，目标温度60摄氏度）的轨迹中，PPO的能耗比Bang-Bang控制低54%，比MCTS低33%。这些结果表明，在相同的物理假设下，通过学习获得的面向截止时间的控制策略能够降低能耗，而规划器无需训练即可提供部分节能效果，且学习到的策略一旦训练完成，其推理成本近乎为零。