Robustness under perturbation and contamination is a prominent issue in statistical learning. We address the robust nonlinear regression based on the so-called interval conditional value-at-risk (In-CVaR), which is introduced to enhance robustness by trimming extreme losses. While recent literature shows that the In-CVaR based statistical learning exhibits superior robustness performance than classical robust regression models, its theoretical robustness analysis for nonlinear regression remains largely unexplored. We rigorously quantify robustness under contamination, with a unified study of distributional breakdown point for a broad class of regression models, including linear, piecewise affine and neural network models with $\ell_1$, $\ell_2$ and Huber losses. Moreover, we analyze the qualitative robustness of the In-CVaR based estimator under perturbation. We show that under several minor assumptions, the In-CVaR based estimator is qualitatively robust in terms of the Prokhorov metric if and only if the largest portion of losses is trimmed. Overall, this study analyzes robustness properties of In-CVaR based nonlinear regression models under both perturbation and contamination, which illustrates the advantages of In-CVaR risk measure over conditional value-at-risk and expectation for robust regression in both theory and numerical experiments.
翻译:在统计学习中,扰动与污染下的鲁棒性是一个突出问题。本文研究基于所谓区间条件风险价值(In-CVaR)的鲁棒非线性回归方法,该方法通过截断极端损失来增强鲁棒性。尽管近期文献表明基于In-CVaR的统计学习相比经典鲁棒回归模型展现出更优越的鲁棒性能,但其在非线性回归中的理论鲁棒性分析仍鲜有探索。我们严格量化了污染情况下的鲁棒性,对包括线性、分段仿射以及采用$\ell_1$、$\ell_2$与Huber损失的神经网络模型在内的广泛回归模型类别,进行了分布崩溃点的统一研究。此外,我们分析了基于In-CVaR的估计量在扰动下的定性鲁棒性。研究表明,在若干温和假设下,基于In-CVaR的估计量在Prokhorov度量意义下具有定性鲁棒性,当且仅当最大部分的损失被截断。总体而言,本研究系统分析了基于In-CVaR的非线性回归模型在扰动与污染双重场景下的鲁棒特性,从理论与数值实验两方面阐明了In-CVaR风险度量相较于条件风险价值与期望准则在鲁棒回归中的优势。