Covariance matrix estimation is an important task in the analysis of multivariate data in disparate scientific fields, including neuroscience, genomics, and astronomy. However, modern scientific data are often incomplete due to factors beyond the control of researchers, and data missingness may prohibit the use of traditional covariance estimation methods. Some existing methods address this problem by completing the data matrix, or by filling the missing entries of an incomplete sample covariance matrix by assuming a low-rank structure. We propose a novel approach that exploits auxiliary variables to complete covariance matrix estimates. An example of auxiliary variable is the distance between neurons, which is usually inversely related to the strength of neuronal correlation. Our method extracts auxiliary information via regression, and involves a single tuning parameter that can be selected empirically. We compare our method with other matrix completion approaches via simulations, and apply it to the analysis of large-scale neuroscience data.
翻译:协方差矩阵估计是神经科学、基因组学和天文学等不同科学领域中多变量数据分析的重要任务。然而,由于研究者无法控制的因素,现代科学数据常常不完整,数据缺失可能使得传统协方差估计方法无法使用。现有方法通过补全数据矩阵,或假设低秩结构来填充不完整样本协方差矩阵的缺失项来解决这一问题。我们提出一种利用辅助变量来补全协方差矩阵估计的新方法。辅助变量的一个例子是神经元之间的距离,通常与神经相关性强度呈反比关系。我们的方法通过回归提取辅助信息,并涉及一个可通过经验选择的单一调优参数。我们通过模拟实验将所提方法与其他矩阵补全方法进行比较,并将其应用于大规模神经科学数据的分析。