Gaussian-Bernoulli restricted Boltzmann machines (GBRBMs) are often used for semi-supervised anomaly detection, where they are trained using only normal data points. In GBRBM-based anomaly detection, normal and anomalous data are classified based on a score that is identical to an energy function of the marginal GBRBM. However, the classification threshold is difficult to set to an appropriate value, as this score cannot be interpreted. In this study, we propose a measure that improves score's interpretability based on its cumulative distribution, and establish a guideline for setting the threshold using the interpretable measure. The results of numerical experiments show that the guideline is reasonable when setting the threshold solely using normal data points. Moreover, because identifying the measure involves computationally infeasible evaluation of the minimum score value, we also propose an evaluation method for the minimum score based on simulated annealing, which is widely used for optimization problems. The proposed evaluation method was also validated using numerical experiments.
翻译:高斯-伯努利受限玻尔兹曼机(GBRBM)常用于半监督异常检测,其训练过程仅依赖正常数据点。在基于GBRBM的异常检测中,正常与异常数据的分类依据是与边缘GBRBM能量函数等价的得分进行。然而,由于该得分无法被解释,分类阈值难以设定为合理值。本研究提出一种基于得分累积分布的可解释性改进度量,并基于该可解释度量建立了阈值设定准则。数值实验结果表明,仅使用正常数据点设定阈值时,该准则具有合理性。此外,由于识别该度量涉及计算上不可行的最小得分值评估,我们进一步提出基于模拟退火(广泛应用于优化问题)的最小得分评估方法。数值实验验证了所提评估方法的有效性。