Consider the normal linear regression setup when the number of covariates p is much larger than the sample size n, and the covariates form correlated groups. The response variable y is not related to an entire group of covariates in all or none basis, rather the sparsity assumption persists within and between groups. We extend the traditional g-prior setup to this framework. Variable selection consistency of the proposed method is shown under fairly general conditions, assuming the covariates to be random and allowing the true model to grow with both n and p. For the purpose of implementation of the proposed g-prior method to high-dimensional setup, we propose two procedures. First, a group screening procedure, termed as group SIS (GSIS), and secondly, a novel stochastic search variable selection algorithm, termed as group informed variable selection algorithm (GiVSA), which uses the known group structure efficiently to explore the model space without discarding any covariate based on an initial screening. Screening consistency of GSIS, and theoretical mixing time of GiVSA are studied using the canonical path ensemble approach of Yang et al. (2016). Performance of the proposed prior with implementation of GSIS as well as GiVSA are validated using various simulated examples and a real data related to residential buildings.
翻译:考虑当协变量数量p远大于样本量n且协变量形成相关组时的正态线性回归设定。响应变量y并非完全基于“全有或全无”的准则与整组协变量相关,而是在组内及组间均存在稀疏性假设。我们将传统的g先验设定扩展至这一框架。在假设协变量为随机变量并允许真实模型随n和p同步增长的前提下,我们证明了所提方法在相当一般的条件下具有变量选择一致性。为将所提出的g先验方法应用于高维环境,我们设计了两种操作流程:其一为分组筛选过程,称为分组SIS(group SIS,简称GSIS);其二是新颖的随机搜索变量选择算法,称为分组知情变量选择算法(group informed variable selection algorithm,简称GiVSA),该算法有效利用已知分组结构探索模型空间,且不基于初始筛选舍弃任何协变量。我们采用Yang等(2016)的典型路径集成方法研究了GSIS的筛选一致性及GiVSA的理论混合时间。通过多种模拟实例及住宅建筑真实数据,验证了所提先验结合GSIS及GiVSA实施的性能。