Feature selection is a crucial tool in machine learning and is widely applied across various scientific disciplines. Traditional supervised methods generally identify a universal set of informative features for the entire population. However, feature relevance often varies with context, while the context itself may not directly affect the outcome variable. Here, we propose a novel architecture for contextual feature selection where the subset of selected features is conditioned on the value of context variables. Our new approach, Conditional Stochastic Gates (c-STG), models the importance of features using conditional Bernoulli variables whose parameters are predicted based on contextual variables. We introduce a hypernetwork that maps context variables to feature selection parameters to learn the context-dependent gates along with a prediction model. We further present a theoretical analysis of our model, indicating that it can improve performance and flexibility over population-level methods in complex feature selection settings. Finally, we conduct an extensive benchmark using simulated and real-world datasets across multiple domains demonstrating that c-STG can lead to improved feature selection capabilities while enhancing prediction accuracy and interpretability.
翻译:特征选择是机器学习中的关键工具,已广泛应用于各类科学领域。传统的监督式方法通常为整个群体识别出一组通用的信息性特征。然而,特征的相关性常随上下文而变化,而上下文本身可能并不直接影响结果变量。本文提出了一种新颖的上下文特征选择架构,其中所选特征的子集以上下文变量的取值为条件。我们提出的新方法——条件随机门控(c-STG)——使用条件伯努利变量对特征重要性进行建模,其参数基于上下文变量进行预测。我们引入了一个超网络,将上下文变量映射到特征选择参数,以学习上下文依赖的门控机制及预测模型。进一步,我们对模型进行了理论分析,表明在复杂的特征选择场景中,该方法相较于群体层面的方法能够提升性能与灵活性。最后,我们在多个领域的模拟和真实数据集上进行了广泛的基准测试,证明c-STG能够提升特征选择能力,同时增强预测准确性与可解释性。