We consider the problem of Neyman-Pearson classification which models unbalanced classification settings where error w.r.t. a distribution $\mu_1$ is to be minimized subject to low error w.r.t. a different distribution $\mu_0$. Given a fixed VC class $\mathcal{H}$ of classifiers to be minimized over, we provide a full characterization of possible distribution-free rates, i.e., minimax rates over the space of all pairs $(\mu_0, \mu_1)$. The rates involve a dichotomy between hard and easy classes $\mathcal{H}$ as characterized by a simple geometric condition, a three-points-separation condition, loosely related to VC dimension.
翻译:我们研究Neyman-Pearson分类问题,该问题建模了非平衡分类场景:在控制相对于分布$\mu_0$的误差较低的前提下,最小化相对于另一分布$\mu_1$的误差。给定一个待优化的固定VC类$\mathcal{H}$分类器,我们完整刻画了所有可能的无分布假设收敛速率,即所有$(\mu_0, \mu_1)$对空间上的极小化极大速率。这些速率揭示了硬类与软类$\mathcal{H}$之间的二分现象,其由简单几何条件(三点分离条件)刻画,该条件与VC维存在松散关联。