The problem of nearest neighbor condensing has enjoyed a long history of study, both in its theoretical and practical aspects. In this paper, we introduce the problem of weighted distance nearest neighbor condensing, where one assigns weights to each point of the condensed set, and then new points are labeled based on their weighted distance nearest neighbor in the condensed set. We study the theoretical properties of this new model, and show that it can produce dramatically better condensing than the standard nearest neighbor rule, yet is characterized by generalization bounds almost identical to the latter. We then suggest a condensing heuristic for our new problem. We demonstrate Bayes consistency for this heuristic, and also show promising empirical results.
翻译:最近邻压缩问题在理论和实践方面均有着悠久的研究历史。本文提出了加权距离最近邻压缩问题,其中为压缩集中的每个点赋予权重,新点则根据其在压缩集中的加权距离最近邻进行标注。我们研究了这一新模型的理论性质,证明其相较于标准最近邻规则能够产生显著更优的压缩效果,同时其泛化界与后者几乎一致。随后,我们针对新问题提出了一种压缩启发式方法,证明了该方法的贝叶斯一致性,并展示了有前景的实验结果。