The topic of nonparametric estimation of smooth boundaries is extensively studied in the conventional setting where pairs of single covariate and response variable are observed. However, this traditional setting often suffers from the cost of data collection. Recent years have witnessed the consistent development of learning algorithms for binary classification problems where one can instead observe paired covariates and binary variable representing the statistical relationship between the covariates. In this work, we theoretically study the question of whether multiple smooth boundaries are learnable if the pairwise binary classification setting is considered. We investigate the question with the statistical dependence of paired covariates to develop a learning algorithm using vector-valued functions. The main theorem shows that there is an empirical risk minimization algorithm in a class of deep ReLU networks such that it produces a consistent estimator for indicator functions defined with smooth boundaries. We also discuss how the pairwise binary classification setting is different from the conventional settings, focusing on the structural condition of function classes. As a by-product, we apply the main theorem to a multiclass nonparametric classification problem where the estimation performance is measured by the excess risk in terms of misclassification.
翻译:在传统的单协变量与响应变量成对观测场景中,光滑边界的非参数估计问题已得到广泛研究。然而,这种传统设置常受限于数据收集成本。近年来,针对二元分类问题的学习算法持续发展,其中可观测成对协变量及表征协变量间统计关系的二元变量。本研究从理论上探讨了在成对二元分类框架下,多个光滑边界是否具有可学习性的问题。我们通过分析成对协变量的统计依赖性,构建了基于向量值函数的学习算法。主要定理表明:存在一种基于深度ReLU网络族的经验风险最小化算法,能够为光滑边界定义的指示函数生成一致估计量。本文还重点讨论了成对二元分类设置与传统设置在函数类结构条件方面的本质差异。作为理论应用的延伸,我们将主要定理应用于多类非参数分类问题,其中估计性能通过误分类的过剩风险进行度量。