Upon observing $n$-dimensional multivariate Gaussian data, when can we infer that the largest $K$ observations came from the largest $K$ means? When $K=1$ and the covariance is isotropic, \cite{Gutmann} argue that this inference is justified when the two-sided difference-of-means test comparing the largest and second largest observation rejects. Leveraging tools from selective inference, we provide a generalization of their procedure that applies for both any $K$ and any covariance structure. We show that our procedure draws the desired inference whenever the two-sided difference-of-means test comparing the pair of observations inside and outside the top $K$ with the smallest standardized difference rejects, and sometimes even when this test fails to reject. Using this insight, we argue that our procedure renders existing simultaneous inference approaches inadmissible when $n > 2$. When the observations are independent (with possibly unequal variances) or equicorrelated, our procedure corresponds exactly to running the two-sided difference-of-means test comparing the pair of observations inside and outside the top $K$ with the smallest standardized difference.
翻译:当观测到n维多元高斯数据时,我们何时能够推断最大的K个观测值来自最大的K个均值?当K=1且协方差为各向同性时,\cite{Gutmann} 认为,当比较最大与第二大观测值的双侧均值差检验拒绝原假设时,该推断是合理的。借助选择性推断的工具,我们提出了其方法的推广形式,适用于任意K值与任意协方差结构。我们证明,当比较前K名内外具有最小标准化差值的观测值对的双侧均值差检验拒绝原假设时,我们的方法总能得出所需推断,有时甚至在该检验未拒绝原假设时也能实现。基于这一洞见,我们认为当n>2时,我们的方法使得现有的同步推断方法成为不可容许的。当观测值相互独立(可能具有不等方差)或等相关时,我们的方法完全等价于对前K名内外具有最小标准化差值的观测值对进行双侧均值差检验。