Repeated measurements are common in many fields, where random variables are observed repeatedly across different subjects. Such data have an underlying hierarchical structure, and it is of interest to learn covariance/correlation at different levels. Most existing methods for sparse covariance/correlation matrix estimation assume independent samples. Ignoring the underlying hierarchical structure and correlation within the subject leads to erroneous scientific conclusions. In this paper, we study the problem of sparse and positive-definite estimation of between-subject and within-subject covariance/correlation matrices for repeated measurements. Our estimators are solutions to convex optimization problems that can be solved efficiently. We establish estimation error rates for the proposed estimators and demonstrate their favorable performance through theoretical analysis and comprehensive simulation studies. We further apply our methods to construct between-subject and within-subject covariance graphs of clinical variables from hemodialysis patients.
翻译:重复测量在许多领域普遍存在,即随机变量在不同受试者上被重复观测。此类数据具有潜在的分层结构,需要学习不同层次的协方差/相关性。现有稀疏协方差/相关矩阵估计方法大多假设样本独立,忽略内在的分层结构及受试者内相关性会导致错误的科学结论。本文研究重复测量数据中受试者间与受试者内协方差/相关矩阵的稀疏正定估计问题。我们的估计量是凸优化问题的解,可高效求解。我们建立了所提出估计量的误差率,并通过理论分析与综合模拟研究证明其优越性能。进一步将方法应用于构建血液透析患者临床变量的受试者间与受试者内协方差图。