RF-GAP has recently been introduced as an improved random forest proximity measure. In this paper, we present PF-GAP, an extension of RF-GAP proximities to proximity forests, an accurate and efficient time series classification model. We use the forest proximities in connection with Multi-Dimensional Scaling to obtain vector embeddings of univariate time series, comparing the embeddings to those obtained using various time series distance measures. We also use the forest proximities alongside Local Outlier Factors to investigate the connection between misclassified points and outliers, comparing with nearest neighbor classifiers which use time series distance measures. We show that the forest proximities may exhibit a stronger connection between misclassified points and outliers than nearest neighbor classifiers.
翻译:RF-GAP 最近作为一种改进的随机森林邻近度度量被提出。本文中,我们提出了 PF-GAP,这是 RF-GAP 邻近度向邻近森林(一种精确且高效的时间序列分类模型)的扩展。我们将森林邻近度与多维标度法结合使用,以获得单变量时间序列的向量嵌入,并将这些嵌入与使用各种时间序列距离度量得到的嵌入进行比较。我们还利用森林邻近度与局部离群因子共同研究误分类点与离群点之间的联系,并与使用时间序列距离度量的最近邻分类器进行比较。我们证明,相较于最近邻分类器,森林邻近度可能在误分类点与离群点之间展现出更强的关联性。