The mainstream theory of hypothesis testing in high-dimensional regression typically assumes the underlying true model is a low-dimensional linear regression model, yet the Box-Cox transformation is a regression technique commonly used to mitigate anomalies like non-additivity and heteroscedasticity. This paper introduces a more flexible framework, the non-parametric Box-Cox model with unspecified transformation, to address model mis-specification in high-dimensional linear hypothesis testing while preserving the interpretation of regression coefficients. Model estimation and computation in high dimensions poses challenges beyond traditional sparse penalization methods. We propose the constrained partial penalized composite probit regression method for sparse estimation and investigate its statistical properties. Additionally, we present a computationally efficient algorithm using augmented Lagrangian and coordinate majorization descent for solving regularization problems with folded concave penalization and linear constraints. For testing linear hypotheses, we propose the partial penalized composite likelihood ratio test, score test and Wald test, and show that their limiting distributions under null and local alternatives follow generalized chi-squared distributions with the same degrees of freedom and noncentral parameter. Extensive simulation studies are conducted to examine the finite sample performance of the proposed tests. Our analysis of supermarket data illustrates potential discrepancies between our testing procedures and standard high-dimensional methods, highlighting the importance of our robustified approach.
翻译:高维回归中假设检验的主流理论通常假定底层真实模型是低维线性回归模型,然而Box-Cox变换是一种常用于缓解非可加性和异方差性等异常的回归技术。本文引入了一个更灵活的框架——非参数Box-Cox模型(变换形式未指定),以解决高维线性假设检验中的模型误设问题,同时保持回归系数的可解释性。高维情形下的模型估计与计算对传统稀疏惩罚方法提出了超越其能力的挑战。我们提出了带约束的部分惩罚复合Probit回归方法用于稀疏估计,并研究了其统计性质。此外,我们提出了一种计算高效的算法,该算法结合增广拉格朗日法和坐标优势化下降法,用于求解带有折叠凹惩罚和线性约束的正则化问题。对于线性假设检验,我们提出了部分惩罚复合似然比检验、得分检验和Wald检验,并证明了在原假设和局部备择假设下,其极限分布服从具有相同自由度和非中心参数的广义卡方分布。我们进行了大量模拟研究以检验所提出检验的有限样本表现。对超市数据的分析揭示了我们的检验程序与标准高维方法之间可能存在的差异,突显了所提出稳健化方法的重要性。