In Generalized Linear Models (GLMs) it is assumed that there is a linear effect of the predictor variables on the outcome. However, this assumption is often too strict, because in many applications predictors have a nonlinear relation with the outcome. Optimal Scaling (OS) transformations combined with GLMs can deal with this type of relations. Transformations of the predictors have been integrated in GLMs before, e.g. in Generalized Additive Models. However, the OS methodology has several benefits. For example, the levels of categorical predictors are quantified directly, such that they can be included in the model without defining dummy variables. This approach enhances the interpretation and visualization of the effect of different levels on the outcome. Furthermore, monotonicity restrictions can be applied to the OS transformations such that the original ordering of the category values is preserved. This improves the interpretation of the effect and may prevent overfitting. The scaling level can be chosen for each individual predictor such that models can include mixed scaling levels. In this way, a suitable transformation can be found for each predictor in the model. The implementation of OS in logistic regression is demonstrated using three datasets that contain a binary outcome variable and a set of categorical and/or continuous predictor variables.
翻译:在广义线性模型中,通常假定预测变量对结果变量具有线性效应。然而这一假设往往过于严格,因为许多实际应用中预测变量与结果之间呈现非线性关系。结合最优尺度变换的广义线性模型能够处理此类关系。此前已有研究将预测变量变换整合进广义线性模型(例如广义加性模型),但最优尺度方法具有若干优势:例如可直接量化分类预测变量的水平,从而无需定义虚拟变量即可将其纳入模型。这种方法增强了对不同水平效应在结果上的解释与可视化能力。此外,可对最优尺度变换施加单调性约束,以保持类别值的原始排序,这既改善了效应解释性,又可防止过拟合。每个预测变量可独立选择尺度水平,使模型能够包含混合尺度水平,从而为模型中的每个预测变量找到合适的变换。本文通过三个包含二分类结果变量及若干分类/连续预测变量的数据集,展示了逻辑回归中最优尺度变换的实现。