Graph-based kNN algorithms have garnered widespread popularity for machine learning tasks, due to their simplicity and effectiveness. However, the conventional kNN graph's reliance on a fixed value of k can hinder its performance, especially in scenarios involving complex data distributions. Moreover, like other classification models, the presence of ambiguous samples along decision boundaries often presents a challenge, as they are more prone to incorrect classification. To address these issues, we propose the Preferential Attached k-Nearest Neighbors Graph (paNNG), which combines adaptive kNN with distribution-based graph construction. By incorporating distribution information, paNNG can significantly improve performance for ambiguous samples by "pulling" them towards their original classes and hence enable enhanced overall accuracy and generalization capability. Through rigorous evaluations on diverse benchmark datasets, paNNG outperforms state-of-the-art algorithms, showcasing its adaptability and efficacy across various real-world scenarios.
翻译:基于图的k近邻算法因其简洁性和有效性在机器学习任务中广受欢迎。然而,传统k近邻图对固定k值的依赖会限制其性能,尤其在处理复杂数据分布的场景中。此外,与其他分类模型一样,决策边界上的模糊样本因更易被错误分类而常构成挑战。为解决这些问题,我们提出了偏好附着k近邻图(即Preferential Attached k-Nearest Neighbors Graph,简称paNNG),该方法将自适应k近邻与基于分布的图构建相结合。通过融入分布信息,paNNG能够将模糊样本"拉回"其原始类别,从而显著提升其性能,进而增强整体准确率与泛化能力。在多个多样化基准数据集上的严格评估表明,paNNG的性能优于现有最优算法,充分展示了其在各类实际场景中的适应性与有效性。