$k$-nearest neighbor classification is a popular non-parametric method because of desirable properties like automatic adaption to distributional scale changes. Unfortunately, it has thus far proved difficult to design active learning strategies for the training of local voting-based classifiers that naturally retain these desirable properties, and hence active learning strategies for $k$-nearest neighbor classification have been conspicuously missing from the literature. In this work, we introduce a simple and intuitive active learning algorithm for the training of $k$-nearest neighbor classifiers, the first in the literature which retains the concept of the $k$-nearest neighbor vote at prediction time. We provide consistency guarantees for a modified $k$-nearest neighbors classifier trained on samples acquired via our scheme, and show that when the conditional probability function $\mathbb{P}(Y=y|X=x)$ is sufficiently smooth and the Tsybakov noise condition holds, our actively trained classifiers converge to the Bayes optimal classifier at a faster asymptotic rate than passively trained $k$-nearest neighbor classifiers.
翻译:$k$最近邻分类是一种流行的非参数方法,因其能自动适应分布尺度变化等优良特性而备受青睐。然而,目前在设计能够自然保留这些优良特性的局部投票分类器训练主动学习策略方面存在困难,因此文献中一直缺乏针对$k$最近邻分类的主动学习方法。本文提出了一种简单直观的主动学习算法用于训练$k$最近邻分类器,这是文献中首个在预测阶段保留$k$最近邻投票概念的算法。我们为基于该方案获取样本训练的改进型$k$最近邻分类器提供了一致性保证,并证明:当条件概率函数$\mathbb{P}(Y=y|X=x)$充分光滑且满足Tsybakov噪声条件时,本文主动训练的分类器以比被动训练$k$最近邻分类器更快的渐近速率收敛至贝叶斯最优分类器。