Graph-based kNN algorithms have garnered widespread popularity for machine learning tasks due to their simplicity and effectiveness. However, as factual data often inherit complex distributions, the conventional kNN graph's reliance on a unified k-value can hinder its performance. A crucial factor behind this challenge is the presence of ambiguous samples along decision boundaries that are inevitably more prone to incorrect classifications. To address the situation, we propose the Distribution-Informed adaptive kNN Graph (DaNNG), which combines adaptive kNN with distribution-aware graph construction. By incorporating an approximation of the distribution with customized k-adaption criteria, DaNNG can significantly improve performance on ambiguous samples, and hence enhance overall accuracy and generalization capability. Through rigorous evaluations on diverse benchmark datasets, DaNNG outperforms state-of-the-art algorithms, showcasing its adaptability and efficacy across various real-world scenarios.
翻译:基于图的k近邻算法因其简洁性和有效性,在机器学习任务中获得了广泛欢迎。然而,由于真实数据通常具有复杂分布特性,传统kNN图依赖统一k值的策略会限制其性能表现。这一挑战的关键成因在于决策边界处存在难以避免的歧义样本,这类样本更易产生错误分类。针对该问题,我们提出分布感知自适应kNN图(DaNNG),该方法将自适应kNN与分布感知图构建相结合。通过结合分布近似估计与定制化k值自适应准则,DaNNG能显著提升对歧义样本的分类性能,从而增强整体准确率与泛化能力。在多个标准基准数据集上的严格评估表明,DaNNG的性能优于现有最优算法,充分验证了其在各类实际场景中的适应性与有效性。