The logistic regression model is one of the most popular data generation model in noisy binary classification problems. In this work, we study the sample complexity of estimating the parameters of the logistic regression model up to a given $\ell_2$ error, in terms of the dimension and the inverse temperature, with standard normal covariates. The inverse temperature controls the signal-to-noise ratio of the data generation process. While both generalization bounds and asymptotic performance of the maximum-likelihood estimator for logistic regression are well-studied, the non-asymptotic sample complexity that shows the dependence on error and the inverse temperature for parameter estimation is absent from previous analyses. We show that the sample complexity curve has two change-points in terms of the inverse temperature, clearly separating the low, moderate, and high temperature regimes.
翻译:逻辑回归模型是噪声二分类问题中最常用的数据生成模型之一。本文研究了在协变量服从标准正态分布的条件下,以给定$\ell_2$误差估计逻辑回归模型参数所需的样本复杂度,重点分析其与数据维度和逆温度参数的关系。逆温度参数控制着数据生成过程的信噪比。尽管逻辑回归的最大似然估计的泛化界和渐近性能已有充分研究,但现有分析尚未揭示参数估计中样本复杂度对误差和逆温度参数依赖关系的非渐近特征。我们发现样本复杂度曲线在逆温度参数上存在两个转折点,清晰地区分出低温、中温和高温三种机制。