We consider the problem of estimating a high-dimensional covariance matrix from a small number of observations when covariates on pairs of variables are available and the variables can have spatial structure. This is motivated by the problem arising in demography of estimating the covariance matrix of the total fertility rate (TFR) of 195 different countries when only 11 observations are available. We construct an estimator for high-dimensional covariance matrices by exploiting information about pairwise covariates, such as whether pairs of variables belong to the same cluster, or spatial structure of the variables, and interactions between the covariates. We reformulate the problem in terms of a mixed effects model. This requires the estimation of only a small number of parameters, which are easy to interpret and which can be selected using standard procedures. The estimator is consistent under general conditions, and asymptotically normal. It works if the mean and variance structure of the data is already specified or if some of the data are missing. We assess its performance under our model assumptions, as well as under model misspecification, using simulations. We find that it outperforms several popular alternatives. We apply it to the TFR dataset and draw some conclusions.
翻译:本文研究在变量对协变量存在且变量具有空间结构的情况下,从少量观测数据中估计高维协方差矩阵的问题。该研究源于人口统计学中的一个实际问题:在仅有11个观测样本的情况下,需要估计195个不同国家总和生育率(TFR)的协方差矩阵。我们通过利用变量对协变量信息(例如变量对是否属于同一聚类)、变量的空间结构以及协变量之间的交互作用,构建了一种高维协方差矩阵估计器。我们将该问题重新表述为混合效应模型,从而仅需估计少量参数。这些参数易于解释,且可通过标准程序进行选择。该估计器在一般条件下具有一致性且渐近正态,适用于数据均值与方差结构已预先设定或存在部分数据缺失的情况。我们通过模拟实验评估了其在模型假设正确及模型误设下的性能,发现其优于多种常用替代方法。最后,我们将其应用于TFR数据集并得出相应结论。