Angular Minkowski $p$-distance is a dissimilarity measure that is obtained by replacing Euclidean distance in the definition of cosine dissimilarity with other Minkowski $p$-distances. Cosine dissimilarity is frequently used with datasets containing token frequencies, and angular Minkowski $p$-distance may potentially be an even better choice for certain tasks. In a case study based on the 20-newsgroups dataset, we evaluate clasification performance for classical weighted nearest neighbours, as well as fuzzy rough nearest neighbours. In addition, we analyse the relationship between the hyperparameter $p$, the dimensionality $m$ of the dataset, the number of neighbours $k$, the choice of weights and the choice of classifier. We conclude that it is possible to obtain substantially higher classification performance with angular Minkowski $p$-distance with suitable values for $p$ than with classical cosine dissimilarity.
翻译:角闵可夫斯基$p$-距离是一种相异性度量,其通过将余弦相异性定义中的欧几里得距离替换为其他闵可夫斯基$p$-距离得到。余弦相异性常用于包含标记频率的数据集,而角闵可夫斯基$p$-距离可能在某些任务中成为更优选择。基于20-新闻组数据集的案例研究,我们评估了经典加权最近邻及模糊粗糙最近邻的分类性能。此外,我们分析了超参数$p$、数据集维度$m$、邻居数量$k$、权重选择及分类器选择之间的关系。结论表明,与经典余弦相异性相比,采用适当的$p$值后,角闵可夫斯基$p$-距离能够显著提升分类性能。