Reinforcement Learning-Based Co-Design and Operation of Chiller and Thermal Energy Storage for Cost-Optimal HVAC Systems

We study the joint operation and sizing of cooling infrastructure for commercial HVAC systems using reinforcement learning, with the objective of minimizing life-cycle cost over a 30-year horizon. The cooling system consists of a fixed-capacity electric chiller and a thermal energy storage (TES) unit, jointly operated to meet stochastic hourly cooling demands under time-varying electricity prices. The life-cycle cost accounts for both capital expenditure and discounted operating cost, including electricity consumption and maintenance. A key challenge arises from the strong asymmetry in capital costs: increasing chiller capacity by one unit is far more expensive than an equivalent increase in TES capacity. As a result, identifying the right combination of chiller and TES sizes, while ensuring zero loss-of-cooling-load under optimal operation, is a non-trivial co-design problem. To address this, we formulate the chiller operation problem for a fixed infrastructure configuration as a finite-horizon Markov Decision Process (MDP), in which the control action is the chiller part-load ratio (PLR). The MDP is solved using a Deep Q Network (DQN) with a constrained action space. The learned DQN RL policy minimizes electricity cost over historical traces of cooling demand and electricity prices. For each candidate chiller-TES sizing configuration, the trained policy is evaluated. We then restrict attention to configurations that fully satisfy the cooling demand and perform a life-cycle cost minimization over this feasible set to identify the cost-optimal infrastructure design. Using this approach, we determine the optimal chiller and thermal energy storage capacities to be 700 and 1500, respectively.

翻译：本研究采用强化学习方法，针对商业建筑暖通空调系统的冷却基础设施，开展联合运行与容量配置研究，目标是在30年周期内实现全生命周期成本最小化。冷却系统由固定容量的电驱动冷水机组和蓄冷装置组成，两者协同运行以满足随机的小时级冷负荷需求，并应对时变的电价。全生命周期成本综合考虑了资本性支出和折现后的运行成本，包括电力消耗与维护费用。一个关键挑战源于资本成本的严重不对称性：增加单位冷水机组容量的成本远高于同等规模的蓄冷装置容量提升。因此，在确保最优运行下零冷负荷损失的前提下，确定冷水机组与蓄冷装置容量的最佳组合，成为一个非平凡的协同设计问题。为解决此问题，我们将固定基础设施配置下的冷水机组运行问题建模为有限时域马尔可夫决策过程，其中控制动作为冷水机组部分负荷率。该MDP采用具有约束动作空间的深度Q网络进行求解。学习得到的DQN强化学习策略可在历史冷负荷与电价数据轨迹上实现电力成本最小化。针对每个候选的冷水机组-蓄冷装置容量配置方案，我们对训练好的策略进行评估。随后，我们将关注范围限定在完全满足冷负荷需求的配置方案集合内，通过在该可行集上进行全生命周期成本最小化计算，从而确定成本最优的基础设施设计方案。应用该方法，我们得出最优冷水机组与蓄冷装置容量分别为700和1500。