We propose an estimation procedure for covariation in wide compositional data sets. For compositions, widely-used logratio variables are interdependent due to a common reference. Logratio uncorrelated compositions are linearly independent before the unit-sum constraint is imposed. We show how they are used to construct bespoke shrinkage targets for logratio covariance matrices and test a simple procedure for partial correlation estimates on both a simulated and a single-cell gene expression data set. For the underlying counts, different zero imputations are evaluated. The partial correlation induced by the closure is derived analytically. Data and code are available from GitHub.
翻译:我们提出了一种针对宽幅成分数据集中协方差的估计程序。对于成分数据而言,由于存在共同参考,广泛使用的对数比变量之间存在相互依赖关系。在对数比非相关成分数据中,变量在施加单位总和约束之前是线性独立的。我们展示了如何利用这些变量为对数比协方差矩阵构建定制化的收缩目标,并通过模拟数据集和单细胞基因表达数据集测试了一种用于偏相关估计的简单程序。针对基础计数数据,我们评估了不同的零值插补方法。通过解析方法推导了由闭合性引起的偏相关关系。数据和代码可在GitHub上获取。