The logistic regression model is one of the most popular data generation model in noisy binary classification problems. In this work, we study the sample complexity of estimating the parameters of the logistic regression model up to a given $\ell_2$ error, in terms of the dimension and the inverse temperature, with standard normal covariates. The inverse temperature controls the signal-to-noise ratio of the data generation process. While both generalization bounds and asymptotic performance of the maximum-likelihood estimator for logistic regression are well-studied, the non-asymptotic sample complexity that shows the dependence on error and the inverse temperature for parameter estimation is absent from previous analyses. We show that the sample complexity curve has two change-points (or critical points) in terms of the inverse temperature, clearly separating the low, moderate, and high temperature regimes.
翻译:逻辑回归模型是含噪二分类问题中最流行的数据生成模型之一。本文研究了在标准正态协变量条件下,给定$\ell_2$误差时,逻辑回归模型参数估计的样本复杂度与维度和逆温度的关系。逆温度控制了数据生成过程的信噪比。虽然逻辑回归最大似然估计的泛化界和渐近性能已被充分研究,但现有分析中缺少关于参数估计依赖于误差和逆温度的非渐近样本复杂度。我们证明,样本复杂度曲线在逆温度维度上存在两个突变点(或临界点),清晰地将低温、中温和高温区域分隔开来。