Parameter prediction is essential for many applications, facilitating insightful interpretation and decision-making. However, in many real life domains, such as power systems, medicine, and engineering, it can be very expensive to acquire ground truth labels for certain datasets as they may require extensive and expensive laboratory testing. In this work, we introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs), which significantly reduces the required number of labeled data points to perform parameter prediction, effectively exploiting information contained in large unlabeled datasets. Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU). The values estimated for newly-encountered data points are computed utilizing the average of the $n$ closest labeled data points in the SOM's U-matrix in tandem with a topological shortest path distance calculation scheme. Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques, including linear and polynomial regression, Gaussian process regression, K-nearest neighbors, as well as deep neural network models and related clustering schemes.
翻译:参数预测对于许多应用至关重要,有助于获得深入的见解并辅助决策。然而,在电力系统、医学和工程等许多现实领域,获取某些数据集的真实标签可能代价高昂,因为这往往需要广泛且昂贵的实验室测试。本研究提出一种基于自组织映射中拓扑投影的半监督学习方法,该方法显著减少了执行参数预测所需的标记数据点数量,有效利用了大型未标记数据集中包含的信息。该方法首先在未标记数据上训练自组织映射,然后将极少数可用的标记数据点分配给关键的最佳匹配单元。对于新遇到的数据点,其估计值通过计算自组织映射U矩阵中最近的n个标记数据点的平均值,并结合拓扑最短路径距离计算方案得出。结果表明,本文提出的弱监督模型显著优于传统回归技术,包括线性回归、多项式回归、高斯过程回归、K近邻、深度神经网络模型以及相关聚类方案。