We consider the problem of testing a null hypothesis defined by equality and inequality constraints on a statistical parameter. Testing such hypotheses can be challenging because the number of relevant constraints may be on the same order or even larger than the number of observed samples. Moreover, standard distributional approximations may be invalid due to irregularities in the null hypothesis. We propose a general testing methodology that aims to circumvent these difficulties. The constraints are estimated by incomplete U-statistics, and we derive critical values by Gaussian multiplier bootstrap. We show that the bootstrap approximation of incomplete U-statistics is valid for kernels that we call mixed degenerate when the number of combinations used to compute the incomplete U-statistic is of the same order as the sample size. It follows that our test controls type I error even in irregular settings. Furthermore, the bootstrap approximation covers high-dimensional settings making our testing strategy applicable for problems with many constraints. The methodology is applicable, in particular, when the constraints to be tested are polynomials in U-estimable parameters. As an application, we consider goodness-of-fit tests of latent tree models for multivariate data.
翻译:我们考虑检验由统计参数上的等式和不等式约束定义的原假设这一问题。由于相关约束的数量可能与观测样本量处于同一量级甚至更大,此类假设的检验极具挑战性。此外,原假设中的非正则性可能导致标准分布近似失效。我们提出一种旨在规避这些困难的一般性检验方法。该约束通过不完全U-统计量进行估计,并采用高斯乘子自举法推导临界值。我们证明,当用于计算不完全U-统计量的组合数量与样本量处于同一量级时,针对我们称为混合退化核的核函数,不完全U-统计量的自举近似是有效的。据此,即使在非正则设定下,我们的检验方法仍能控制第一类错误。此外,该自举近似覆盖高维设定,使得我们的检验策略适用于包含大量约束的问题。特别地,当待检验约束为U-可估参数的多项式函数时,该方法同样适用。作为应用实例,我们考虑多元数据潜树模型的拟合优度检验。