Logistic regression is a ubiquitous method for probabilistic classification. However, the effectiveness of logistic regression depends upon careful and relatively computationally expensive tuning, especially for the regularisation hyperparameter, and especially in the context of high-dimensional data. We present a prevalidated ridge regression model that closely matches logistic regression in terms of classification error and log-loss, particularly for high-dimensional data, while being significantly more computationally efficient and having effectively no hyperparameters beyond regularisation. We scale the coefficients of the model so as to minimise log-loss for a set of prevalidated predictions derived from the estimated leave-one-out cross-validation error. This exploits quantities already computed in the course of fitting the ridge regression model in order to find the scaling parameter with nominal additional computational expense.
翻译:逻辑回归是一种广泛使用的概率分类方法。然而,逻辑回归的有效性依赖于细致且计算成本相对较高的调参过程,尤其是正则化超参数的调整,在高维数据背景下这一问题尤为突出。本文提出一种预验证岭回归模型,该模型在分类误差和对数损失方面与逻辑回归高度接近(尤其针对高维数据),同时具有显著更高的计算效率,且除正则化外几乎无需调整超参数。我们通过最小化基于留一交叉验证误差估计的预验证预测集的对数损失来缩放模型系数。该方法利用拟合岭回归模型过程中已计算的统计量来求解缩放参数,仅需名义上的额外计算开销。