Load shapes derived from smart meter data are frequently employed to analyze daily energy consumption patterns, particularly in the context of applications like Demand Response (DR). Nevertheless, one of the most important challenges to this endeavor lies in identifying the most suitable consumer clusters with similar consumption behaviors. In this paper, we present a novel machine learning based framework in order to achieve optimal load profiling through a real case study, utilizing data from almost 5000 households in London. Four widely used clustering algorithms are applied specifically K-means, K-medoids, Hierarchical Agglomerative Clustering and Density-based Spatial Clustering. An empirical analysis as well as multiple evaluation metrics are leveraged to assess those algorithms. Following that, we redefine the problem as a probabilistic classification one, with the classifier emulating the behavior of a clustering algorithm,leveraging Explainable AI (xAI) to enhance the interpretability of our solution. According to the clustering algorithm analysis the optimal number of clusters for this case is seven. Despite that, our methodology shows that two of the clusters, almost 10\% of the dataset, exhibit significant internal dissimilarity and thus it splits them even further to create nine clusters in total. The scalability and versatility of our solution makes it an ideal choice for power utility companies aiming to segment their users for creating more targeted Demand Response programs.
翻译:从智能电表数据中提取的负荷曲线常被用于分析日常能耗模式,尤其是在需求响应等应用场景中。然而,这一工作的主要挑战之一在于识别具有相似消费行为的最优用户聚类。本文基于一个实际案例研究,利用伦敦近5000户家庭的数据,提出了一种新颖的机器学习框架以实现最优负荷曲线分析。我们应用了四种广泛使用的聚类算法,即K-means、K-medoids、层次凝聚聚类和基于密度的空间聚类,并通过实证分析和多种评估指标对算法进行评价。随后,我们将问题重新定义为概率分类问题,使分类器模拟聚类算法的行为,并借助可解释人工智能(xAI)提升解决方案的可解释性。根据聚类算法分析,本案例的最优聚类数为七。尽管如此,我们的方法表明,其中两个聚类(约占数据集的10%)内部存在显著异质性,因此进一步将其拆分以形成总共九个聚类。该解决方案的可扩展性和通用性使其成为电力公司为制定更具针对性的需求响应项目而对用户进行分群的理想选择。