Given a classification problem and a family of classifiers, the Rashomon ratio measures the proportion of classifiers that yield less than a given loss. Previous work has explored the advantage of a large Rashomon ratio in the case of a finite family of classifiers. Here we consider the more general case of an infinite family. We show that a large Rashomon ratio guarantees that choosing the classifier with the best empirical accuracy among a random subset of the family, which is likely to improve generalizability, will not increase the empirical loss too much. We quantify the Rashomon ratio in two examples involving infinite classifier families in order to illustrate situations in which it is large. In the first example, we estimate the Rashomon ratio of the classification of normally distributed classes using an affine classifier. In the second, we obtain a lower bound for the Rashomon ratio of a classification problem with a modified Gram matrix when the classifier family consists of two-layer ReLU neural networks. In general, we show that the Rashomon ratio can be estimated using a training dataset along with random samples from the classifier family and we provide guarantees that such an estimation is close to the true value of the Rashomon ratio.
翻译:给定一个分类问题和一个分类器族,Rashomon比率衡量的是损失低于给定值的分类器所占的比例。先前的研究探讨了在有限分类器族情况下较大Rashomon比率的优势。本文考虑了更一般的无限分类器族情况。我们证明,较大的Rashomon比率能够保证,从该族中随机选取子集并选择其中经验准确率最高的分类器(这很可能提升泛化能力)时,不会使经验损失增加过多。我们通过两个涉及无限分类器族的例子量化了Rashomon比率,以说明其取值较大的情形。在第一个例子中,我们使用仿射分类器估计了正态分布分类问题的Rashomon比率。在第二个例子中,当分类器族由两层ReLU神经网络组成时,我们针对带有修正Gram矩阵的分类问题获得了Rashomon比率的下界。总体上,我们表明可以利用训练数据集和从分类器族中随机采样的样本来估计Rashomon比率,并提供了这种估计接近Rashomon比率真实值的保证。