Capturing the conditional covariances or correlations among the elements of a multivariate response vector based on covariates is important to various fields including neuroscience, epidemiology and biomedicine. We propose a new method called Covariance Regression with Random Forests (CovRegRF) to estimate the covariance matrix of a multivariate response given a set of covariates, using a random forest framework. Random forest trees are built with a splitting rule specially designed to maximize the difference between the sample covariance matrix estimates of the child nodes. We also propose a significance test for the partial effect of a subset of covariates. We evaluate the performance of the proposed method and significance test through a simulation study which shows that the proposed method provides accurate covariance matrix estimates and that the Type-1 error is well controlled. An application of the proposed method to thyroid disease data is also presented. CovRegRF is implemented in a freely available R package on CRAN.
翻译:基于协变量捕捉多元响应向量各元素之间的条件协方差或相关性在神经科学、流行病学和生物医学等多个领域具有重要意义。我们提出一种名为“基于随机森林的协方差回归(CovRegRF)”的新方法,利用随机森林框架,在给定一组协变量的条件下估计多元响应的协方差矩阵。该方法的随机森林树采用专门设计的节点分裂规则,旨在最大化子节点样本协方差矩阵估计之间的差异。同时,我们提出了一种针对协变量子集部分效应的显著性检验方法。通过模拟研究评估了所提方法及显著性检验的性能,结果表明该方法能提供准确的协方差矩阵估计,且第一类错误率得到良好控制。此外,我们还展示了该方法在甲状腺疾病数据上的应用。CovRegRF已在CRAN上的开源R包中实现。