Real-world validation of safe reinforcement learning, model predictive control and decision tree-based home energy management systems

Recent advancements in machine learning based energy management approaches, specifically reinforcement learning with a safety layer (OptLayerPolicy) and a metaheuristic algorithm generating a decision tree control policy (TreeC), have shown promise. However, their effectiveness has only been demonstrated in computer simulations. This paper presents the real-world validation of these methods, comparing against model predictive control and simple rule-based control benchmark. The experiments were conducted on the electrical installation of 4 reproductions of residential houses, which all have their own battery, photovoltaic and dynamic load system emulating a non-controllable electrical load and a controllable electric vehicle charger. The results show that the simple rules, TreeC, and model predictive control-based methods achieved similar costs, with a difference of only 0.6%. The reinforcement learning based method, still in its training phase, obtained a cost 25.5\% higher to the other methods. Additional simulations show that the costs can be further reduced by using a more representative training dataset for TreeC and addressing errors in the model predictive control implementation caused by its reliance on accurate data from various sources. The OptLayerPolicy safety layer allows safe online training of a reinforcement learning agent in the real-world, given an accurate constraint function formulation. The proposed safety layer method remains error-prone, nonetheless, it is found beneficial for all investigated methods. The TreeC method, which does require building a realistic simulation for training, exhibits the safest operational performance, exceeding the grid limit by only 27.1 Wh compared to 593.9 Wh for reinforcement learning.

翻译：近年来，基于机器学习的能源管理方法取得了显著进展，特别是带有安全层的强化学习（OptLayerPolicy）以及生成决策树控制策略的元启发式算法（TreeC），已展现出良好的应用前景。然而，这些方法的有效性此前仅在计算机仿真中得到验证。本文首次在真实世界中对这些方法进行了验证，并将其与模型预测控制及简单的基于规则的基准控制方法进行了比较。实验在四套复现的住宅电气系统上进行，每套系统均配备独立的蓄电池、光伏发电装置以及模拟不可控电力负荷和可控电动汽车充电器的动态负载系统。结果表明，基于简单规则、TreeC 和模型预测控制的方法取得了相近的能源成本，差异仅为 0.6%。而基于强化学习的方法（仍处于训练阶段）的成本比其他方法高出 25.5%。进一步的仿真表明，通过为 TreeC 使用更具代表性的训练数据集，并解决模型预测控制因其依赖多源精确数据而导致的实现误差，可以进一步降低成本。在给定精确约束函数表述的前提下，OptLayerPolicy 安全层使得强化学习智能体能够在真实世界中进行安全的在线训练。尽管所提出的安全层方法仍存在出错的可能，但研究发现它对所有被考察的方法均有益处。TreeC 方法虽然需要构建逼真的仿真环境进行训练，但其运行安全性最高，仅超出电网限制 27.1 Wh，而强化学习方法则超出 593.9 Wh。