This work addresses the performance comparison between four clustering techniques with the objective of achieving strong hybrid models in supervised learning tasks. A real dataset from a bio-climatic house named Sotavento placed on experimental wind farm and located in Xermade (Lugo) in Galicia (Spain) has been collected. Authors have chosen the thermal solar generation system in order to study how works applying several cluster methods followed by a regression technique to predict the output temperature of the system. With the objective of defining the quality of each clustering method two possible solutions have been implemented. The first one is based on three unsupervised learning metrics (Silhouette, Calinski-Harabasz and Davies-Bouldin) while the second one, employs the most common error measurements for a regression algorithm such as Multi Layer Perceptron.
翻译:本研究针对四种聚类技术进行了性能比较,旨在监督学习任务中构建强混合模型。实验数据采集自位于西班牙加利西亚卢戈省克萨马德市索塔文托实验风电场的生物气候房屋。作者选取了热太阳能发电系统,通过应用多种聚类方法后结合回归技术,预测该系统的输出温度。为评估各聚类方法的质量,实施了两类解决方案:其一基于三种无监督学习指标(Silhouette、Calinski-Harabasz和Davies-Bouldin),其二采用回归算法(如多层感知机)中最常见的误差测量方法。