迈向统计假设检验的统一理论：带有干扰协方差矩阵的多变量正态均值 (Towards a unified theory for testing statistical hypothesis: Multinormal mean with nuisance covariance matrix)

Under a multinormal distribution with an arbitrary unknown covariance matrix, the main purpose of this paper is to propose a framework to achieve the goal of reconciliation of Bayesian, frequentist, and Fisher's reporting $p$-values, Neyman-Pearson's optimal theory and Wald's decision theory for the problems of testing mean against restricted alternatives (closed convex cones). To proceed, the tests constructed via the likelihood ratio (LR) and the union-intersection (UI) principles are studied. For the problems of testing against restricted alternatives, first, we show that the LRT and the UIT are not the proper Bayes tests, however, they are shown to be the integrated LRT and the integrated UIT, respectively. For the problem of testing against the positive orthant space alternative, both the null distributions of the LRT and the UIT depend on the unknown nuisance covariance matrix. Hence we have difficulty adopting Fisher's approach to reporting $p$-values. On the other hand, according to the definition of the level of significance, both the LRT and the UIT are shown to be power-dominated by the corresponding LRT and UIT for testing against the half-space alternative, respectively. Hence, both the LRT and the UIT are $\alpha$-inadmissible, these results are against the common statistical sense. Neither Fisher's approach of reporting $p$-values alone nor Neyman-Pearson's optimal theory for power function alone is a satisfactory criterion for evaluating the performance of tests. Wald's decision theory via $d$-admissibility may shed light on resolving these challenging issues of imposing the balance between type 1 error and power.

翻译：在具有任意未知协方差矩阵的多变量正态分布下，本文的主要目的是提出一个框架，以实现在检验均值相对于受限备择假设（闭凸锥）问题时，对贝叶斯方法、频率主义方法、费希尔报告$p$值方法、奈曼-皮尔逊最优理论以及瓦尔德决策理论进行调和的目标。为此，我们研究了通过似然比（LR）原则和并交（UI）原则构建的检验。对于检验受限备择假设的问题，首先，我们证明LRT和UIT并非恰当的贝叶斯检验，但它们分别被证明是积分LRT和积分UIT。对于检验正象限空间备择假设的问题，LRT和UIT的零分布均依赖于未知的干扰协方差矩阵。因此，我们难以采用费希尔的方法来报告$p$值。另一方面，根据显著性水平的定义，LRT和UIT分别被证明在检验半空间备择假设时，其功效被相应的LRT和UIT所支配。因此，LRT和UIT都是$\alpha$不可容许的，这些结果违背了常见的统计直觉。无论是费希尔单独报告$p$值的方法，还是奈曼-皮尔逊单独基于功效函数的最优理论，都不是评估检验性能的满意标准。瓦尔德通过$d$可容许性的决策理论，可能为解决这些在控制第一类错误与功效之间寻求平衡的挑战性问题提供启示。