We propose a generalisation of the logistic regression model, that aims to account for non-linear main effects and complex interactions, while keeping the model inherently explainable. This is obtained by starting with log-odds that are linear in the covariates, and adding non-linear terms that depend on at least two covariates. More specifically, we use a generative specification of the model, consisting of a combination of certain margins on natural exponential form, combined with vine copulas. The estimation of the model is however based on the discriminative likelihood, and dependencies between covariates are included in the model, only if they contribute significantly to the distinction between the two classes. Further, a scheme for model selection and estimation is presented. The methods described in this paper are implemented in the R package LogisticCopula. In order to assess the performance of our model, we ran an extensive simulation study. The results from the study, as well as from a couple of examples on real data, showed that our model performs at least as well as natural competitors, especially in the presence of non-linearities and complex interactions, even when $n$ is not large compared to $p$.
翻译:本文提出了一种广义逻辑回归模型,旨在解释非线性主效应和复杂交互作用,同时保持模型固有的可解释性。该模型从协变量线性对数比出发,并添加至少依赖于两个协变量的非线性项。具体而言,我们采用生成式模型设定,将自然指数形式的特定边缘分布与藤Copula相结合。然而,模型的估计基于判别似然,仅当协变量间的依赖关系对两类区分有显著贡献时才纳入模型。此外,本文提出了模型选择与估计方案。所述方法已在R包LogisticCopula中实现。为评估模型性能,我们进行了广泛的模拟研究。研究结果及实际数据示例表明,即使在$n$相对于$p$不大的情况下,该模型性能至少与自然竞争模型相当,尤其在存在非线性和复杂交互作用时表现突出。