In this paper, we propose two new algorithms for maximum-likelihood estimation (MLE) of high dimensional sparse covariance matrices. Unlike most of the state of-the-art methods, which either use regularization techniques or penalize the likelihood to impose sparsity, we solve the MLE problem based on an estimated covariance graph. More specifically, we propose a two-stage procedure: in the first stage, we determine the sparsity pattern of the target covariance matrix (in other words the marginal independence in the covariance graph under a Gaussian graphical model) using the multiple hypothesis testing method of false discovery rate (FDR), and in the second stage we use either a block coordinate descent approach to estimate the non-zero values or a proximal distance approach that penalizes the distance between the estimated covariance graph and the target covariance matrix. Doing so gives rise to two different methods, each with its own advantage: the coordinate descent approach does not require tuning of any hyper-parameters, whereas the proximal distance approach is computationally fast but requires a careful tuning of the penalty parameter. Both methods are effective even in cases where the number of observed samples is less than the dimension of the data. For performance evaluation, we test the proposed methods on both simulated and real-world data and show that they provide more accurate estimates of the sparse covariance matrix than two state-of-the-art methods.
翻译:本文针对高维稀疏协方差矩阵的最大似然估计问题提出两种新算法。与多数现有方法(通常采用正则化技术或对似然函数施加惩罚项以实现稀疏性)不同,我们基于估计的协方差图求解最大似然估计问题。具体而言,我们提出两阶段流程:在第一阶段,利用多重假设检验方法——错误发现率控制法确定目标协方差矩阵的稀疏模式(即高斯图模型下协方差图中的边际独立性结构);在第二阶段,分别采用块坐标下降法估计非零值,或采用近端距离法对估计协方差图与目标协方差矩阵之间的距离施加惩罚。由此衍生出两种各具优势的方法:坐标下降法无需调节任何超参数,而近端距离法虽计算速度快但需要精细调节惩罚参数。两种方法在观测样本数小于数据维度的情况下依然有效。在性能评估中,我们通过模拟实验和真实数据测试所提方法,结果表明相较于两种现有方法,本方法能更准确地估计稀疏协方差矩阵。