Knowing the features of a complex system that are highly relevant to a particular target variable is of fundamental interest in many areas of science. Existing approaches are often limited to linear settings, sometimes lack guarantees, and in most cases, do not scale to the problem at hand, in particular to images. We propose DRCFS, a doubly robust feature selection method for identifying the causal features even in nonlinear and high dimensional settings. We provide theoretical guarantees, illustrate necessary conditions for our assumptions, and perform extensive experiments across a wide range of simulated and semi-synthetic datasets. DRCFS significantly outperforms existing state-of-the-art methods, selecting robust features even in challenging highly non-linear and high-dimensional problems.
翻译:了解复杂系统中与特定目标变量高度相关的特征,是许多科学领域的基础问题。现有方法通常局限于线性场景,部分方法缺乏理论保证,且大多数方法无法有效应对实际问题——尤其是图像处理中的高维特征选择。本文提出DRCFS,一种即使在非线性和高维环境下也能识别因果特征的双稳健特征选择方法。我们给出了理论保证,阐明了假设成立的必要条件,并在广泛的模拟和半合成数据集上进行了大量实验。DRCFS显著优于现有最先进方法,即使在极具挑战性的高度非线性和高维问题中也能选择出稳健特征。