Kernel methods have been extensively utilized in machine learning for classification and prediction tasks due to their ability to capture complex non-linear data patterns. However, single kernel approaches are inherently limited, as they rely on a single type of kernel function (e.g., Gaussian kernel), which may be insufficient to fully represent the heterogeneity or multifaceted nature of real-world data. Multiple kernel learning (MKL) addresses these limitations by constructing composite kernels from simpler ones and integrating information from heterogeneous sources. Despite these advances, traditional MKL methods are primarily designed for continuous outcomes. We extend MKL to accommodate the outcome variable belonging to the exponential family, representing a broader variety of data types, and refer to our proposed method as generalized linear models with integrated multiple additive regression with kernels (GLIMARK). Empirically, we demonstrate that GLIMARK can effectively recover or approximate the true data-generating mechanism. We have applied it to a COVID-19 chest X-ray dataset, predicting binary outcomes of ICU escalation and extracting clinically meaningful features, underscoring the practical utility of this approach in real-world scenarios.
翻译:核方法因其能够捕捉复杂的非线性数据模式,在机器学习的分类与预测任务中得到了广泛应用。然而,单核方法存在固有局限性,因其依赖于单一类型的核函数(例如高斯核),可能不足以充分表征现实世界数据的异质性或多面性。多核学习通过组合简单核构建复合核,并整合来自异构源的信息,从而解决了这些限制。尽管取得了这些进展,传统的多核学习方法主要设计用于连续型结局变量。我们将多核学习扩展至适应属于指数族的结局变量,以涵盖更广泛的数据类型,并将我们提出的方法称为带核集成多重加性回归的广义线性模型。实证研究表明,GLIMARK能够有效恢复或逼近真实的数据生成机制。我们已将其应用于COVID-19胸部X光数据集,预测ICU升级的二元结局并提取具有临床意义的特征,从而突显了该方法在实际场景中的实用价值。