It is often challenging for a user to articulate their preferences accurately in multi-objective decision-making problems. Demonstration-based preference inference (DemoPI) is a promising approach to mitigate this problem. Understanding the behaviours and values of energy customers is an example of a scenario where preference inference can be used to gain insights into the values of energy customers with multiple objectives, e.g. cost and comfort. In this work, we applied the state-of-art DemoPI method, i.e., the dynamic weight-based preference inference (DWPI) algorithm in a multi-objective residential energy consumption setting to infer preferences from energy consumption demonstrations by simulated users following a rule-based approach. According to our experimental results, the DWPI model achieves accurate demonstration-based preference inferring in three scenarios. These advancements enhance the usability and effectiveness of multi-objective reinforcement learning (MORL) in energy management, enabling more intuitive and user-friendly preference specifications, and opening the door for DWPI to be applied in real-world settings.
翻译:在多目标决策问题中,用户往往难以准确表达其偏好。基于示范的偏好推断(DemoPI)是一种有望缓解该问题的方法。理解能源用户的行为与价值观,即是偏好推断可用于洞察具有多个目标(如成本和舒适度)的能源用户价值观的一个场景。在本研究中,我们将最先进的DemoPI方法——即基于动态权重的偏好推断(DWPI)算法——应用于多目标住宅能源消耗场景,通过模拟采用基于规则方法的用户所提供的能源消耗示范来推断其偏好。我们的实验结果表明,DWPI模型在三种场景下均能实现准确的基于示范的偏好推断。这些进展增强了多目标强化学习(MORL)在能源管理中的可用性与有效性,使得偏好设定更加直观和用户友好,并为DWPI在现实环境中的应用打开了大门。