Graph-based kNN algorithms have garnered widespread popularity for machine learning tasks due to their simplicity and effectiveness. However, as factual data often inherit complex distributions, the conventional kNN graph's reliance on a unified k-value can hinder its performance. A crucial factor behind this challenge is the presence of ambiguous samples along decision boundaries that are inevitably more prone to incorrect classifications. To address the situation, we propose the Preferential Attached k-Nearest Neighbors Graph (paNNG), which adopts distribution-aware adaptive-k into graph construction. By incorporating distribution information as a cohesive entity, paNNG can significantly improve performance on ambiguous samples by "pulling" them towards their original classes and hence enhance overall generalization capability. Through rigorous evaluations on diverse datasets, paNNG outperforms state-of-the-art algorithms, showcasing its adaptability and efficacy across various real-world scenarios.
翻译:基于图的kNN算法因简洁高效而在机器学习任务中广受欢迎。然而,实际数据常继承复杂分布,传统kNN图依赖统一k值的特性会限制其性能。这一挑战背后的关键因素在于决策边界附近的模糊样本不可避免地更易产生错误分类。针对该问题,我们提出偏好附加k近邻图(Preferential Attached k-Nearest Neighbors Graph, paNNG),该方法将分布感知自适应k值融入图构建过程。通过将分布信息作为统一整体纳入考量,paNNG能通过将模糊样本"拉回"其原始类别显著提升对这类样本的分类性能,从而增强整体泛化能力。经过多组数据集的严谨评估,paNNG在各类实际场景中均表现出优于现有最优算法的适应性与有效性。