Theoretically, the conditional expectation of a square-integrable random variable $Y$ given a $d$-dimensional random vector $X$ can be obtained by minimizing the mean squared distance between $Y$ and $f(X)$ over all Borel measurable functions $f \colon \mathbb{R}^d \to \mathbb{R}$. However, in many applications this minimization problem cannot be solved exactly, and instead, a numerical method which computes an approximate minimum over a suitable subfamily of Borel functions has to be used. The quality of the result depends on the adequacy of the subfamily and the performance of the numerical method. In this paper, we derive an expected value representation of the minimal mean squared distance which in many applications can efficiently be approximated with a standard Monte Carlo average. This enables us to provide guarantees for the accuracy of any numerical approximation of a given conditional expectation. We illustrate the method by assessing the quality of approximate conditional expectations obtained by linear, polynomial and neural network regression in different concrete examples.
翻译:从理论上讲,给定一个d维随机向量X的平方可积随机变量Y的条件期望,可以通过在所有Borel可测函数f:ℝ^d→ℝ上最小化Y与f(X)之间的均方距离来获得。然而,在许多应用中,这一最小化问题无法精确求解,而必须采用数值方法在适当的Borel函数子族上计算近似最小值。结果的质量取决于子族的适当性以及数值方法的性能。在本文中,我们推导了最小均方距离的期望值表示,该表示在许多应用中可以通过标准蒙特卡洛平均有效近似。这使我们能够为任何给定条件期望的数值近似精度提供保证。我们通过评估线性、多项式和神经网络回归在不同具体示例中获得的近似条件期望的质量,来说明该方法。