Training machine learning and statistical models often involves optimizing a data-driven risk criterion. The risk is usually computed with respect to the empirical data distribution, but this may result in poor and unstable out-of-sample performance due to distributional uncertainty. In the spirit of distributionally robust optimization, we propose a novel robust criterion by combining insights from Bayesian nonparametric (i.e., Dirichlet Process) theory and recent decision-theoretic models of smooth ambiguity-averse preferences. First, we highlight novel connections with standard regularized empirical risk minimization techniques, among which Ridge and LASSO regressions. Then, we theoretically demonstrate the existence of favorable finite-sample and asymptotic statistical guarantees on the performance of the robust optimization procedure. For practical implementation, we propose and study tractable approximations of the criterion based on well-known Dirichlet Process representations. We also show that the smoothness of the criterion naturally leads to standard gradient-based numerical optimization. Finally, we provide insights into the workings of our method by applying it to high-dimensional sparse linear regression, binary classification, and robust location parameter estimation tasks.
翻译:训练机器学习和统计模型通常涉及优化数据驱动的风险准则。该风险通常根据经验数据分布计算,但由于分布不确定性,可能导致较差且不稳定的样本外性能。秉承分布鲁棒优化的精神,我们结合贝叶斯非参数(即狄利克雷过程)理论的最新进展以及平滑模糊厌恶偏好的决策理论模型,提出了一种新的鲁棒准则。首先,我们揭示了其与标准正则化经验风险最小化技术(包括岭回归和LASSO回归)之间的新颖联系。随后,我们从理论上证明了该鲁棒优化过程在有限样本和渐近统计性能上存在有利的保证。在实际实施中,我们基于广为人知的狄利克雷过程表示,提出并研究了该准则的可处理近似方法。此外,我们还展示了准则的平滑性如何自然引导出基于梯度的标准数值优化方法。最后,通过将方法应用于高维稀疏线性回归、二分类和鲁棒位置参数估计任务,我们深入剖析了其工作原理。