This paper investigates the asymptotic distribution of the maximum-likelihood estimate (MLE) in multinomial logistic models in the high-dimensional regime where dimension and sample size are of the same order. While classical large-sample theory provides asymptotic normality of the MLE under certain conditions, such classical results are expected to fail in high-dimensions as documented for the binary logistic case in the seminal work of Sur and Cand\`es [2019]. We address this issue in classification problems with 3 or more classes, by developing asymptotic normality and asymptotic chi-square results for the multinomial logistic MLE (also known as cross-entropy minimizer) on null covariates. Our theory leads to a new methodology to test the significance of a given feature. Extensive simulation studies on synthetic data corroborate these asymptotic results and confirm the validity of proposed p-values for testing the significance of a given feature.
翻译:本文研究了在高维区域(维度和样本量同阶)中多元逻辑模型的最大似然估计的渐近分布。经典的大样本理论在特定条件下提供了最大似然估计的渐近正态性,但正如Sur和Candès [2019]在二元逻辑回归的里程碑式工作中所记录的那样,此类经典结果在高维情况下预计会失效。我们通过针对具有三个及以上类别的分类问题,发展了关于零协变量上多元逻辑最大似然估计(也称为交叉熵最小化)的渐近正态性和渐近卡方分布结果来解决这一问题。我们的理论为检验给定特征的显著性提供了新的方法。在合成数据上进行的大量模拟研究证实了这些渐近结果,并验证了所提出的用于检验给定特征显著性的p值的有效性。