With the rapid development of modern technology, massive amounts of data with complex pattern are generated. Gaussian process models that can easily fit the non-linearity in data become more and more popular nowadays. It is often the case that in some data only a few features are important or active. However, unlike classical linear models, it is challenging to identify active variables in Gaussian process models. One of the most commonly used methods for variable selection in Gaussian process models is automatic relevance determination, which is known to be open-ended. There is no rule of thumb to determine the threshold for dropping features, which makes the variable selection in Gaussian process models ambiguous. In this work, we propose two variable selection algorithms for Gaussian process models, which use the artificial nuisance columns as baseline for identifying the active features. Moreover, the proposed methods work for both regression and classification problems. The algorithms are demonstrated using comprehensive simulation experiments and an application to multi-subject electroencephalography data that studies alcoholic levels of experimental subjects.
翻译:随着现代技术的快速发展,产生了大量具有复杂模式的数据。能够灵活拟合数据中非线性的高斯过程模型如今日益流行。在许多数据中,往往只有少数特征具有重要性或表现出活跃性。然而,与经典线性模型不同,在高斯过程模型中识别活跃变量具有挑战性。高斯过程模型中变量选择最常用的方法之一是自动相关性确定,该方法已知是开放性的。目前尚无确定特征剔除阈值的经验法则,这使得高斯过程模型中的变量选择存在歧义。本研究针对高斯过程模型提出了两种变量选择算法,通过引入人工干扰列作为基线来识别活跃特征。此外,所提方法同时适用于回归与分类问题。通过综合仿真实验以及一项研究受试者酒精水平的多人脑电图数据应用,验证了所提算法的有效性。