The fundamental concept underlying K-Nearest Neighbors (KNN) is the classification of samples based on the majority through their nearest neighbors. Although distance and neighbors' labels are critical in KNN, traditional KNN treats all samples equally. However, some KNN variants weigh neighbors differently based on a specific rule, considering each neighbor's distance and label. Many KNN methodologies introduce complex algorithms that do not significantly outperform the traditional KNN, often leading to less satisfactory outcomes. The gap in reliably extracting information for accurately predicting true weights remains an open research challenge. In our proposed method, information-modified KNN (IMKNN), we bridge the gap by presenting a straightforward algorithm that achieves effective results. To this end, we introduce a classification method to improve the performance of the KNN algorithm. By exploiting mutual information (MI) and incorporating ideas from Shapley's values, we improve the traditional KNN performance in accuracy, precision, and recall, offering a more refined and effective solution. To evaluate the effectiveness of our method, it is compared with eight variants of KNN. We conduct experiments on 12 widely-used datasets, achieving 11.05\%, 12.42\%, and 12.07\% in accuracy, precision, and recall performance, respectively, compared to traditional KNN. Additionally, we compared IMKNN with traditional KNN across four large-scale datasets to highlight the distinct advantages of IMKNN in the impact of monotonicity, noise, density, subclusters, and skewed distributions. Our research indicates that IMKNN consistently surpasses other methods in diverse datasets.
翻译:K近邻(KNN)的基本概念是根据最近邻的多数类别对样本进行分类。尽管距离和邻居标签在KNN中至关重要,但传统KNN对所有样本一视同仁。然而,部分KNN变体基于特定规则对邻居进行差异化加权,同时考虑每个邻居的距离和标签。许多KNN方法引入了复杂算法,但并未显著超越传统KNN,往往导致结果不尽如人意。如何可靠提取信息以准确预测真实权重,仍是一个开放性的研究挑战。我们提出的信息修正K近邻(IMKNN)方法通过一种简洁且效果显著的算法填补了这一空白。为此,我们引入一种分类方法以提升KNN算法性能。通过利用互信息(MI)并结合沙普利值的思想,我们改进了传统KNN在准确率、精确率和召回率上的表现,提供了更精细有效的解决方案。为评估方法有效性,我们将其与八种KNN变体进行对比。在12个广泛使用的数据集上进行的实验表明,相较于传统KNN,IMKNN在准确率、精确率和召回率上分别提升了11.05%、12.42%和12.07%。此外,我们还在四个大规模数据集上比较了IMKNN与传统KNN,以突出IMKNN在单调性、噪声、密度、子簇及偏斜分布影响方面的显著优势。研究表明,IMKNN在不同数据集中持续优于其他方法。