In this work, we consider the problem of goodness-of-fit (GoF) testing for parametric models -- for example, testing whether observed data follows a logistic regression model. This testing problem involves a composite null hypothesis, due to the unknown values of the model parameters. In some special cases, co-sufficient sampling (CSS) can remove the influence of these unknown parameters via conditioning on a sufficient statistic -- often, the maximum likelihood estimator (MLE) of the unknown parameters. However, many common parametric settings (including logistic regression) do not permit this approach, since conditioning on a sufficient statistic leads to a powerless test. The recent approximate co-sufficient sampling (aCSS) framework of Barber and Janson (2022) offers an alternative, replacing sufficiency with an approximately sufficient statistic (namely, a noisy version of the MLE). This approach recovers power in a range of settings where CSS cannot be applied, but can only be applied in settings where the unconstrained MLE is well-defined and well-behaved, which implicitly assumes a low-dimensional regime. In this work, we extend aCSS to the setting of constrained and penalized maximum likelihood estimation, so that more complex estimation problems can now be handled within the aCSS framework, including examples such as mixtures-of-Gaussians (where the unconstrained MLE is not well-defined due to degeneracy) and high-dimensional Gaussian linear models (where the MLE can perform well under regularization, such as an $\ell_1$ penalty or a shape constraint).
翻译:在本文中,我们考虑参数模型的拟合优度检验问题——例如,检验观测数据是否遵循逻辑回归模型。由于模型参数值未知,该检验问题涉及一个复合零假设。在某些特殊情形下,共充分抽样可通过以充分统计量(通常为未知参数的最大似然估计)为条件来消除这些未知参数的影响。然而,许多常见参数设定(包括逻辑回归)无法采用该方法,因为以充分统计量为条件会导致检验失效。Barber与Janson(2022)近期提出的近似共充分抽样框架提供了一种替代方案,以近似充分统计量(即含噪声版本的最大似然估计)替代充分性。该方法在共充分抽样无法应用的诸多设定中恢复了检验功效,但仅适用于无约束最大似然估计定义良好且行为规范的场景——这隐含着对低维情形的假设。本文中,我们将近似共充分抽样拓展至带约束与惩罚的最大似然估计设定,使更复杂的估计问题(例如高斯混合模型(因退化导致无约束最大似然估计定义不良)和高维高斯线性模型(在正则化如ℓ₁惩罚或形状约束下最大似然估计表现良好))可在近似共充分抽样框架中处理。