Logistic regression is widely used in many areas of knowledge. Several works compare the performance of lasso and maximum likelihood estimation in logistic regression. However, part of these works do not perform simulation studies and the remaining ones do not consider scenarios in which the ratio of the number of covariates to sample size is high. In this work, we compare the discrimination performance of lasso and maximum likelihood estimation in logistic regression using simulation studies and applications. Variable selection is done both by lasso and by stepwise when maximum likelihood estimation is used. We consider a wide range of values for the ratio of the number of covariates to sample size. The main conclusion of the work is that lasso has a better discrimination performance than maximum likelihood estimation when the ratio of the number of covariates to sample size is high.
翻译:Logistic回归广泛应用于多个知识领域。已有研究比较了lasso与最大似然估计在Logistic回归中的性能,然而其中部分工作未进行模拟研究,其余则未考虑协变量数量与样本量之比过高的情况。本研究通过模拟实验与实际应用,对比了lasso与最大似然估计在Logistic回归中的判别性能。当采用最大似然估计时,变量选择分别通过lasso和逐步回归两种方式实现。我们考虑了协变量数量与样本量之比的广泛取值范围。研究的主要结论是:当协变量数量与样本量之比过高时,lasso的判别性能优于最大似然估计。