RF-GAP has recently been introduced as an improved random forest proximity measure. In this paper, we present PF-GAP, an extension of RF-GAP proximities to proximity forests, an accurate and efficient time series classification model. We use the forest proximities in connection with Multi-Dimensional Scaling to obtain vector embeddings of univariate time series, comparing the embeddings to those obtained using various time series distance measures. We also use the forest proximities alongside Local Outlier Factors to investigate the connection between misclassified points and outliers, comparing with nearest neighbor classifiers which use time series distance measures. We show that the forest proximities may exhibit a stronger connection between misclassified points and outliers than nearest neighbor classifiers.
翻译:RF-GAP 作为一种改进的随机森林邻近性度量方法近期被提出。本文介绍 PF-GAP,即 RF-GAP 邻近性度量在邻近森林(一种精确高效的时间序列分类模型)上的扩展。我们结合多维尺度分析方法,利用森林邻近性度量获取单变量时间序列的向量嵌入表示,并将所得嵌入与采用多种时间序列距离度量方法生成的嵌入进行比较。同时,我们联合使用森林邻近性度量与局部离群因子方法,探究误分类点与离群点之间的关联性,并与采用时间序列距离度量的最近邻分类器进行对比分析。研究表明,相较于最近邻分类器,森林邻近性度量可能展现出误分类点与离群点之间更强的关联性。