Parameter prediction is essential for many applications, facilitating insightful interpretation and decision-making. However, in many real life domains, such as power systems, medicine, and engineering, it can be very expensive to acquire ground truth labels for certain datasets as they may require extensive and expensive laboratory testing. In this work, we introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs), which significantly reduces the required number of labeled data points to perform parameter prediction, effectively exploiting information contained in large unlabeled datasets. Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are ultimately assigned to key best matching units (BMU). The values estimated for newly-encountered data points are computed utilizing the average of the $n$ closest labeled data points in the SOM's U-matrix in tandem with a topological shortest path distance calculation scheme. Our results indicate that the proposed semi-supervised model significantly outperforms traditional regression techniques, including linear and polynomial regression, Gaussian process regression, K-nearest neighbors, as well as various deep neural network models.
翻译:参数预测对许多应用至关重要,有助于实现有洞察力的解读和决策。然而,在电力系统、医学和工程等众多现实领域,获取某些数据集的地面实况标签可能成本极高,因为这些标签可能需要大量且昂贵的实验室测试。本文提出了一种基于自组织映射中拓扑投影的半监督学习方法,该方法能显著减少执行参数预测所需的标注数据点数量,有效利用大型未标注数据集中包含的信息。我们提出的方法首先在未标注数据上训练SOM,然后将最少数量的可用标注数据点最终分配到关键的最佳匹配单元。对于新遇到的数据点,其估计值是利用SOM的U矩阵中最近的n个标注数据点的平均值,并结合拓扑最短路径距离计算方案来计算的。我们的结果表明,所提出的半监督模型显著优于传统回归技术,包括线性回归和多项式回归、高斯过程回归、K近邻以及各种深度神经网络模型。