For a Bayes classifier whose input space is a graph, we study the structure of the boundary, which comprises those points for which at least one neighbor is classified differently. The scientific setting is assignment of DNA reads produced by next generations sequencers to candidate source genomes. We show that the boundary is both large and complicated in structure. A new measure of uncertainty, Neighbor Similarity, which compares the classifier result for an input point to the distribution of results for its neighbors, not only tracks two inherent uncertainty measures for the Bayes classifier, but also can be implemented for classifiers without inherent measures of uncertainty.
翻译:对于输入空间为图的贝叶斯分类器,我们研究了其边界的结构,该边界由至少一个邻居被分类不同的点组成。科学背景是将新一代测序仪产生的DNA片段分配给候选源基因组。我们发现该边界规模庞大且结构复杂。一种新的不确定性度量——邻居相似度,通过将输入点的分类器结果与其邻居的结果分布进行比较,不仅追踪了贝叶斯分类器固有的两种不确定性度量,而且还可以适用于没有固有不确定性度量的分类器。