Deep regression learning with optimal loss function

In this paper, we develop a novel efficient and robust nonparametric regression estimator under a framework of feedforward neural network. There are several interesting characteristics for the proposed estimator. First, the loss function is built upon an estimated maximum likelihood function, who integrates the information from observed data, as well as the information from data structure. Consequently, the resulting estimator has desirable optimal properties, such as efficiency. Second, different from the traditional maximum likelihood estimation (MLE), the proposed method avoid the specification of the distribution, hence is flexible to any kind of distribution, such as heavy tails, multimodal or heterogeneous distribution. Third, the proposed loss function relies on probabilities rather than direct observations as in least squares, contributing the robustness in the proposed estimator. Finally, the proposed loss function involves nonparametric regression function only. This enables a direct application of existing packages, simplifying the computation and programming. We establish the large sample property of the proposed estimator in terms of its excess risk and minimax near-optimal rate. The theoretical results demonstrate that the proposed estimator is equivalent to the true MLE in which the density function is known. Our simulation studies show that the proposed estimator outperforms the existing methods in terms of prediction accuracy, efficiency and robustness. Particularly, it is comparable to the true MLE, and even gets better as the sample size increases. This implies that the adaptive and data-driven loss function from the estimated density may offer an additional avenue for capturing valuable information. We further apply the proposed method to four real data examples, resulting in significantly reduced out-of-sample prediction errors compared to existing methods.

翻译：本文在前馈神经网络框架下提出了一种新型高效且稳健的非参数回归估计量。该估计量具有若干显著特性：第一，损失函数基于估计的最大似然函数构建，该函数整合了观测数据信息及数据结构信息，从而使所得估计量具备效率等理想的最优性质；第二，与传统最大似然估计（MLE）不同，所提方法无需指定分布形式，因此对重尾分布、多峰分布或异质分布等任意分布类型均具备灵活性；第三，所提损失函数依赖于概率而非最小二乘法中的直接观测值，从而增强了估计量的稳健性；第四，该损失函数仅涉及非参数回归函数，便于直接应用现有软件包，简化了计算与编程过程。我们从超额风险及极小化近优速率角度建立了所提估计量的大样本性质，理论结果表明该估计量等价于密度函数已知情形下的真实最大似然估计。仿真研究表明，所提估计量在预测精度、效率及稳健性方面均优于现有方法，尤其与真实最大似然估计相当，且随样本量增大表现更优，这表明基于估计密度的自适应数据驱动损失函数或为捕获有价值信息提供了新途径。我们将所提方法应用于四个实际数据案例，相较于现有方法显著降低了样本外预测误差。