Buildings account for 40 % of global energy consumption. A considerable portion of building energy consumption stems from heating, ventilation, and air conditioning (HVAC), and thus implementing smart, energy-efficient HVAC systems has the potential to significantly impact the course of climate change. In recent years, model-free reinforcement learning algorithms have been increasingly assessed for this purpose due to their ability to learn and adapt purely from experience. They have been shown to outperform classical controllers in terms of energy cost and consumption, as well as thermal comfort. However, their weakness lies in their relatively poor data efficiency, requiring long periods of training to reach acceptable policies, making them inapplicable to real-world controllers directly. Hence, common research goals are to improve the learning speed, as well as to improve their ability to generalize, in order to facilitate transfer learning to unseen building environments. In this paper, we take a federated learning approach to training the reinforcement learning controller of an HVAC system. A global control policy is learned by aggregating local policies trained on multiple data centers located in different climate zones. The goal of the policy is to simultaneously minimize energy consumption and maximize thermal comfort. The federated optimization strategy indirectly increases both the rate at which experience data is collected and the variation in the data. We demonstrate through experimental evaluation that these effects lead to a faster learning speed, as well as greater generalization capabilities in the federated policy compared to any individually trained policy.
翻译:建筑消耗了全球40%的能源。建筑能耗的相当大一部分来自供暖、通风和空调(HVAC)系统,因此实现智能、节能的HVAC系统有望显著影响气候变化的进程。近年来,无模型强化学习算法因其仅凭经验学习和适应的能力而被越来越多地用于此目的。在能源成本、消耗以及热舒适性方面,这些算法已被证明优于经典控制器。然而,其弱点在于数据效率相对较低,需要长时间训练才能达到可接受的策略,因此无法直接应用于实际控制器。因此,常见的研究目标是提高学习速度及其泛化能力,以促进向未见建筑环境的迁移学习。本文采用联邦学习方法训练HVAC系统的强化学习控制器。通过聚合位于不同气候区的多个数据中心训练的局部策略,学习全局控制策略。该策略的目标是同时最小化能耗并最大化热舒适性。联邦优化策略间接提高了经验数据的收集速率以及数据的变化性。实验评估表明,与任何单独训练的策略相比,这些效应使联邦策略具有更快的学习速度和更强的泛化能力。