A general framework for formulating structured variable selection

In variable selection, a selection rule that prescribes the permissible sets of selected variables (called a "selection dictionary") is desirable due to the inherent structural constraints among the candidate variables. Such selection rules can be complex in real-world data analyses, and failing to incorporate such restrictions could not only compromise the interpretability of the model but also lead to decreased prediction accuracy. However, no general framework has been proposed to formalize selection rules and their applications, which poses a significant challenge for practitioners seeking to integrate these rules into their analyses. In this work, we establish a framework for structured variable selection that can incorporate universal structural constraints. We develop a mathematical language for constructing arbitrary selection rules, where the selection dictionary is formally defined. We demonstrate that all selection rules can be expressed as combinations of operations on constructs, facilitating the identification of the corresponding selection dictionary. Once this selection dictionary is derived, practitioners can apply their own user-defined criteria to select the optimal model. Additionally, our framework enhances existing penalized regression methods for variable selection by providing guidance on how to appropriately group variables to achieve the desired selection rule. Furthermore, our innovative framework opens the door to establishing new l0 norm-based penalized regression techniques that can be tailored to respect arbitrary selection rules, thereby expanding the possibilities for more robust and tailored model development.

翻译：在变量选择中，由于候选变量间存在固有结构约束，制定允许的选定变量集（称为“选择字典”）的筛选规则具有重要价值。实际数据分析中的此类筛选规则可能非常复杂，忽视这些约束不仅会削弱模型的可解释性，还可能降低预测精度。然而，目前尚未有通用框架对筛选规则及其应用进行形式化定义，这为研究人员将此类规则整合至分析中带来了重大挑战。本研究建立了可整合通用结构约束的结构化变量选择框架。我们开发了一套数学语言用于构建任意筛选规则，并正式定义了选择字典。研究表明所有筛选规则均可表示为结构操作的组合形式，这有助于识别对应的选择字典。一旦获得该选择字典，研究者便可应用自定义准则选择最优模型。此外，本框架通过指导如何合理分组变量以实现目标筛选规则，增强了现有用于变量选择的惩罚回归方法。更重要的是，这一创新框架为建立基于l0范数的定制化惩罚回归技术开辟了新途径，该技术能够遵循任意筛选规则，从而拓展了构建更稳健、更定制化模型的可能性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【AAAI2022】面向多标签分类的端到端概率标签特征学习

专知会员服务

32+阅读 · 2022年1月27日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2021】基于隐含结构推理网络的事件因果关系识别

专知会员服务

52+阅读 · 2021年8月13日