In many research fields, researchers aim to identify significant associations between a set of explanatory variables and a response while controlling the FDR. The Knockoff filter has been recently proposed in the frequentist paradigm to introduce controlled noise in a model by cleverly constructing copies of the predictors as auxiliary variables. We develop a fully Bayesian generalization of the classical model-X knockoff filter for normally distributed covariates. In our approach, we consider a joint model for the covariates and the response, where the conditional independence structure of the covariates is captured through a Gaussian graphical model and used to define a latent knockoff layer through a parameter-expanded representation of the response model. Estimating the covariate graph informs the knockoff construction and improves inference on the covariate effects. We use a modified spike-and-slab prior on the regression coefficients, avoiding the increase of the model dimension typical of the classical knockoff filter. We also address extensions to non-Gaussian responses. Our model performs variable selection using an upper bound on the posterior probability of non-inclusion. We show that the induced latent knockoff layer defines valid Gaussian model-X knockoffs under the proposed construction and that the resulting procedure controls the Bayesian FDR at an arbitrary level, in finite samples, if the distribution of the covariates is fully known; under an estimated graphical structure, it satisfies an asymptotic FDR guarantee. We use simulated data to demonstrate that our proposal increases the stability of the selection with respect to classical knockoff methods. With respect to Bayesian variable selection methods, our selection procedure achieves comparable or better performances, while maintaining control over the FDR. We conclude with an application to real data.
翻译:在许多研究领域中,研究者旨在识别一组解释变量与响应变量之间的显著关联,同时控制错误发现率(FDR)。近年来,频率学派框架下提出的拷贝过滤器通过巧妙构建预测变量的副本作为辅助变量,在模型中引入受控噪声。我们针对正态分布协变量,提出了经典模型-X拷贝过滤器的完全贝叶斯推广方法。在该方法中,我们考虑协变量与响应变量的联合模型,其中协变量的条件独立结构通过高斯图模型捕捉,并利用响应模型的参数扩展表示定义潜在拷贝层。协变量图的估计有助于指导拷贝构造并改进协变量效应的推断。我们对回归系数采用修正的尖峰-平板先验,避免了经典拷贝过滤器典型的模型维度膨胀问题。我们还处理了非高斯响应变量的扩展情况。该模型通过非包含后验概率的上界进行变量选择。我们证明:在所提构造下,诱导的潜在拷贝层定义了有效的高斯模型-X拷贝,且若协变量分布完全已知,该过程可在有限样本中控制任意水平的贝叶斯FDR;在估计的图结构下,它满足渐近FDR保证。我们使用模拟数据证明,与经典拷贝方法相比,本方案提高了选择的稳定性。相较于贝叶斯变量选择方法,我们的选择方法在维持FDR控制的同时,实现了可比或更优的性能。最后,我们将其应用于实际数据。