We study the problem of contextual feature selection, where the goal is to learn a predictive function while identifying subsets of informative features conditioned on specific contexts. Towards this goal, we generalize the recently proposed stochastic gates (STG) Yamada et al. [2020] by modeling the probabilistic gates as conditional Bernoulli variables whose parameters are predicted based on the contextual variables. Our new scheme, termed conditional-STG (c-STG), comprises two networks: a hypernetwork that establishes the mapping between contextual variables and probabilistic feature selection parameters and a prediction network that maps the selected feature to the response variable. Training the two networks simultaneously ensures the comprehensive incorporation of context and feature selection within a unified model. We provide a theoretical analysis to examine several properties of the proposed framework. Importantly, our model leads to improved flexibility and adaptability of feature selection and, therefore, can better capture the nuances and variations in the data. We apply c-STG to simulated and real-world datasets, including healthcare, housing, and neuroscience, and demonstrate that it effectively selects contextually meaningful features, thereby enhancing predictive performance and interpretability.
翻译:我们研究了上下文特征选择问题,其目标是在学习预测函数的同时,识别基于特定上下文的信息性子集特征。为实现这一目标,我们对近期提出的随机门(STG)Yamada等人[2020]进行了泛化处理,将概率门建模为条件伯努利变量,其参数根据上下文变量进行预测。我们的新方案称为条件STG(c-STG),由两个网络组成:一个超网络,建立上下文变量与概率特征选择参数之间的映射;一个预测网络,将所选特征映射到响应变量。同时训练这两个网络确保了将上下文与特征选择全面整合到统一模型中。我们提供了理论分析,以考察所提出框架的若干性质。重要的是,我们的模型增强了特征选择的灵活性和适应性,从而能够更好地捕捉数据中的细微差异和变化。我们将c-STG应用于模拟数据集和真实世界数据集(包括医疗保健、住房和神经科学领域),并证明它能够有效地选择具有上下文意义的特征,从而提升预测性能和可解释性。