On combining estimated and analytic covariance matrices

The statistical analysis of cosmological data often assumes a Gaussian sampling distribution and relies on covariance matrices estimated from simulations. In this setting, the likelihood function of the data is not Gaussian but is instead a multivariate Student-t distribution, arising from marginalisation over an inverse-Wishart distribution for the true covariance matrix. This framework, introduced by Sellentin & Heavens(2016) and extended by Percival et al.(2022), provides a principled drop-in replacement to the Gaussian likelihood with Hartlap correction (Hartlap et al. 2007). The latter removes bias in the precision matrix; it is still widely used, despite failing to reproduce the heavy tails of the true distribution (thus yielding inaccurate probabilities, especially in the case of tensions between datasets). In practice, cosmological analyses frequently involve additional Gaussian error contributions, for example from instrumental noise, foregrounds, super-sample covariance, or emulator uncertainties. The resulting likelihood function is a convolution of the Sellentin-Heavens or Percival likelihoods with an extra Gaussian contribution, and does not have a simple expression. In this note, we derive an accurate approximation for the combined likelihood function, another multivariate Student-t distribution which inherits the heavy tails. The parameters of the Student-t distribution are determined by matching the covariance and multivariate kurtosis to those of the true distribution. We also include a slightly more expensive but fast sampling algorithm, based on the mixture representation of the Student-t distribution, which avoids the approximation altogether, but is not the drop-in replacement for the normal Gaussian or Hartlap likelihood function that the Student-t approximation in this paper provides. (Abridged)

翻译：宇宙学数据的统计分析通常假设高斯抽样分布，并依赖于从模拟中估计的协方差矩阵。在此设定下，数据的似然函数并非高斯分布，而是多元Student-t分布，该分布源于对真实协方差矩阵的逆Wishart分布进行边缘化。这一框架由Sellentin & Heavens（2016）提出，并由Percival等人（2022）扩展，为带有Hartlap修正（Hartlap等人，2007）的高斯似然函数提供了原则性的即插即用替代方案。后者消除了精度矩阵中的偏差；尽管未能再现真实分布的厚尾特性（从而生成不准确的概率，尤其是在数据集之间存在张力的情况下），但它仍被广泛使用。在实践中，宇宙学分析常涉及额外的高斯误差贡献，例如来自仪器噪声、前景、超样本协方差或仿真器不确定性的贡献。由此产生的似然函数是Sellentin-Heavens或Percival似然函数与额外高斯贡献的卷积，且无简单表达式。本文推导了组合似然函数的精确近似，即另一种继承了厚尾特性的多元Student-t分布。该Student-t分布的参数通过匹配协方差与多元峰度至真实分布的对应值来确定。我们还提供了一种基于Student-t分布混合表示的、成本略高但快速的采样算法，该算法完全避免了近似，但并非本文提供的Student-t近似所能实现的对标准高斯或Hartlap似然函数的即插即用替代。（摘要精简版）