In this paper, we develop an {\em epsilon admissible subsets} (EAS) model selection approach for performing group variable selection in the high-dimensional multivariate regression setting. This EAS strategy is designed to estimate a posterior-like, generalized fiducial distribution over a parsimonious class of models in the setting of correlated predictors and/or in the absence of a sparsity assumption. The effectiveness of our approach, to this end, is demonstrated empirically in simulation studies, and is compared to other state-of-the-art model/variable selection procedures. Furthermore, assuming a matrix-Normal linear model we show that the EAS strategy achieves {\em strong model selection consistency} in the high-dimensional setting if there does exist a sparse, true data generating set of predictors. In contrast to Bayesian approaches for model selection, our generalized fiducial approach completely avoids the problem of simultaneously having to specify arbitrary prior distributions for model parameters and penalize model complexity; our approach allows for inference directly on the model complexity. \textcolor{black}{Implementation of the method is illustrated through yeast data to identify significant cell-cycle regulating transcription factors.
翻译:本文开发了一种 {\em epsilon容许子集} (EAS) 模型选择方法,用于在高维多元回归场景下进行分组变量选择。该EAS策略旨在对存在相关性预测变量和/或缺乏稀疏性假设的情形下,在简约模型类上估计类似后验的广义置信分布。通过模拟研究,我们的方法在此目标上的有效性得到了实证检验,并与当前最先进的模型/变量选择程序进行了比较。此外,在矩阵正态线性模型假设下,我们证明了如果确实存在一组稀疏的真实数据生成预测变量,则EAS策略可在高维场景下实现{\em强模型选择一致性}。与贝叶斯模型选择方法不同,我们的广义置信方法完全避免了需同时为模型参数指定任意先验分布并对模型复杂度进行惩罚的问题;我们的方法可直接对模型复杂度进行推断。\textcolor{black}{通过酵母数据,我们演示了该方法在识别重要细胞周期调控转录因子中的应用。}