Out-of-sample prediction is the acid test of predictive models, yet an independent test dataset is often not available for assessment of the prediction error. For this reason, out-of-sample performance is commonly estimated using data splitting algorithms such as cross-validation or the bootstrap. For quantitative outcomes, the ratio of variance explained to total variance can be summarized by the coefficient of determination or in-sample $R^2$, which is easy to interpret and to compare across different outcome variables. As opposed to the in-sample $R^2$, the out-of-sample $R^2$ has not been well defined and the variability on the out-of-sample $\hat{R}^2$ has been largely ignored. Usually only its point estimate is reported, hampering formal comparison of predictability of different outcome variables. Here we explicitly define the out-of-sample $R^2$ as a comparison of two predictive models, provide an unbiased estimator and exploit recent theoretical advances on uncertainty of data splitting estimates to provide a standard error for the $\hat{R}^2$. The performance of the estimators for the $R^2$ and its standard error are investigated in a simulation study. We demonstrate our new method by constructing confidence intervals and comparing models for prediction of quantitative $\text{Brassica napus}$ and $\text{Zea mays}$ phenotypes based on gene expression data.
翻译:样本外预测是检验预测模型的关键准则,然而独立测试数据集往往难以获取以评估预测误差。为此,样本外性能通常通过数据分割算法(如交叉验证或自举法)进行估计。对于定量结果,方差解释比例与总方差之比可通过决定系数或样本内 $R^2$ 来概括,这便于解释并可在不同结果变量间进行比较。与样本内 $R^2$ 不同,样本外 $R^2$ 尚未得到明确定义,且样本外 $\hat{R}^2$ 的变异性常被忽视。通常仅报告其点估计,从而阻碍了不同结果变量可预测性的正式比较。本文明确定义了样本外 $R^2$ 作为两种预测模型的比较,提供了无偏估计量,并利用数据分割估计不确定性方面的最新理论进展,为 $\hat{R}^2$ 提供了标准误差。通过模拟研究考察了 $R^2$ 及其标准误差估计量的性能。我们基于基因表达数据构建置信区间并比较预测 $\text{欧洲油菜}$ 和 $\text{玉米}$ 定量表型的模型,以此展示新方法。