Bayesian Variable Selection Under High-dimensional Settings With Grouped Covariates

Consider the normal linear regression setup when the number of covariates p is much larger than the sample size n, and the covariates form correlated groups. The response variable y is not related to an entire group of covariates in all or none basis, rather the sparsity assumption persists within and between groups. We extend the traditional g-prior setup to this framework. Variable selection consistency of the proposed method is shown under fairly general conditions, assuming the covariates to be random and allowing the true model to grow with both n and p. For the purpose of implementation of the proposed g-prior method to high-dimensional setup, we propose two procedures. First, a group screening procedure, termed as group SIS (GSIS), and secondly, a novel stochastic search variable selection algorithm, termed as group informed variable selection algorithm (GiVSA), which uses the known group structure efficiently to explore the model space without discarding any covariate based on an initial screening. Screening consistency of GSIS, and theoretical mixing time of GiVSA are studied using the canonical path ensemble approach of Yang et al. (2016). Performance of the proposed prior with implementation of GSIS as well as GiVSA are validated using various simulated examples and a real data related to residential buildings.

翻译：考虑当协变量数量p远大于样本量n且协变量形成相关组时的正态线性回归设定。响应变量y并非完全基于“全有或全无”的准则与整组协变量相关，而是在组内及组间均存在稀疏性假设。我们将传统的g先验设定扩展至这一框架。在假设协变量为随机变量并允许真实模型随n和p同步增长的前提下，我们证明了所提方法在相当一般的条件下具有变量选择一致性。为将所提出的g先验方法应用于高维环境，我们设计了两种操作流程：其一为分组筛选过程，称为分组SIS（group SIS，简称GSIS）；其二是新颖的随机搜索变量选择算法，称为分组知情变量选择算法（group informed variable selection algorithm，简称GiVSA），该算法有效利用已知分组结构探索模型空间，且不基于初始筛选舍弃任何协变量。我们采用Yang等（2016）的典型路径集成方法研究了GSIS的筛选一致性及GiVSA的理论混合时间。通过多种模拟实例及住宅建筑真实数据，验证了所提先验结合GSIS及GiVSA实施的性能。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

专知会员服务

198+阅读 · 2019年12月19日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日