One of the key objects of binary classification is the regression function, i.e., the conditional expectation of the class labels given the inputs. With the regression function not only a Bayes optimal classifier can be defined, but it also encodes the corresponding misclassification probabilities. The paper presents a resampling framework to construct exact, distribution-free and non-asymptotically guaranteed confidence regions for the true regression function for any user-chosen confidence level. Then, specific algorithms are suggested to demonstrate the framework. It is proved that the constructed confidence regions are strongly consistent, that is, any false model is excluded in the long run with probability one. The exclusion is quantified with probably approximately correct type bounds, as well. Finally, the algorithms are validated via numerical experiments, and the methods are compared to approximate asymptotic confidence ellipsoids.
翻译:二分类的关键对象之一是回归函数,即给定输入下类别标签的条件期望。回归函数不仅能够定义贝叶斯最优分类器,还编码了相应的误分类概率。本文提出了一种重抽样框架,用于在用户任意选择的置信水平下,为真实回归函数构建精确、无分布且非渐近保证的置信区域。随后,提出了具体算法以演示该框架。我们证明了所构建的置信区域具有强一致性,即从长远来看,任何错误模型都会以概率1被排除。这种排除还通过概率近似正确类型边界进行了量化。最后,通过数值实验验证了这些算法,并将该方法与近似渐近置信椭球进行了比较。