The goal of this thesis is to study the use of the Kantorovich-Rubinstein distance as to build a descriptor of sample complexity in classification problems. The idea is to use the fact that the Kantorovich-Rubinstein distance is a metric in the space of measures that also takes into account the geometry and topology of the underlying metric space. We associate to each class of points a measure and thus study the geometrical information that we can obtain from the Kantorovich-Rubinstein distance between those measures. We show that a large Kantorovich-Rubinstein distance between those measures allows to conclude that there exists a 1-Lipschitz classifier that classifies well the classes of points. We also discuss the limitation of the Kantorovich-Rubinstein distance as a descriptor.
翻译:本论文旨在研究利用Kantorovich-Rubinstein距离构建分类问题中样本复杂度描述子的方法。核心思想基于Kantorovich-Rubinstein距离作为测度空间中的一种度量,能够同时考虑底层度量空间的几何结构与拓扑性质。我们将每个类别中的样本点关联为一个测度,进而探究从这些测度之间的Kantorovich-Rubinstein距离中可获得的几何信息。研究表明,当这些测度间的Kantorovich-Rubinstein距离较大时,可以推断存在一个1-Lipschitz分类器能够很好地区分不同类别的样本点。本文同时讨论了Kantorovich-Rubinstein距离作为描述子的局限性。