In this work, we consider the problem of goodness-of-fit (GoF) testing for parametric models -- for example, testing whether observed data follows a logistic regression model. This testing problem involves a composite null hypothesis, due to the unknown values of the model parameters. In some special cases, co-sufficient sampling (CSS) can remove the influence of these unknown parameters via conditioning on a sufficient statistic -- often, the maximum likelihood estimator (MLE) of the unknown parameters. However, many common parametric settings (including logistic regression) do not permit this approach, since conditioning on a sufficient statistic leads to a powerless test. The recent approximate co-sufficient sampling (aCSS) framework of Barber and Janson (2022) offers an alternative, replacing sufficiency with an approximately sufficient statistic (namely, a noisy version of the MLE). This approach recovers power in a range of settings where CSS cannot be applied, but can only be applied in settings where the unconstrained MLE is well-defined and well-behaved, which implicitly assumes a low-dimensional regime. In this work, we extend aCSS to the setting of constrained and penalized maximum likelihood estimation, so that more complex estimation problems can now be handled within the aCSS framework, including examples such as mixtures-of-Gaussians (where the unconstrained MLE is not well-defined due to degeneracy) and high-dimensional Gaussian linear models (where the MLE can perform well under regularization, such as an $\ell_1$ penalty or a shape constraint).
翻译:本文研究参数模型的拟合优度检验问题——例如,检验观测数据是否服从逻辑回归模型。由于模型参数值未知,该检验问题涉及复合零假设。在某些特殊情形下,共充分抽样可通过基于充分统计量(通常为未知参数的最大似然估计)的条件化处理来消除未知参数的影响。然而,许多常见参数模型(包括逻辑回归)因条件化充分统计量会导致检验失效而无法采用该方法。Barber与Janson(2022)近期提出的近似共充分抽样框架提供了替代方案,将充分性替换为近似充分统计量(即带噪声的最大似然估计)。该框架能在无法应用共充分抽样的场景中恢复检验功效,但仅适用于无约束最大似然估计量定义良好且表现稳健的情形——这隐含假设了低维参数空间。本研究将近似共充分抽样扩展至约束与惩罚最大似然估计框架,使得更复杂的估计问题(如混合高斯模型——因退化性导致无约束最大似然估计量无法定义,以及高维高斯线性模型——在正则化条件下最大似然估计表现优异,如采用ℓ₁惩罚或形状约束)可在近似共充分抽样框架下处理。