Knowing the features of a complex system that are highly relevant to a particular target variable is of fundamental interest in many areas of science. Existing approaches are often limited to linear settings, sometimes lack guarantees, and in most cases, do not scale to the problem at hand, in particular to images. We propose DRCFS, a doubly robust feature selection method for identifying the causal features even in nonlinear and high dimensional settings. We provide theoretical guarantees, illustrate necessary conditions for our assumptions, and perform extensive experiments across a wide range of simulated and semi-synthetic datasets. DRCFS significantly outperforms existing state-of-the-art methods, selecting robust features even in challenging highly non-linear and high-dimensional problems.
翻译:了解复杂系统中与特定目标变量高度相关的特征,是众多科学领域的基础性研究问题。现有方法通常局限于线性场景,部分方法缺乏理论保证,且大多数方法难以扩展到实际问题中,尤其是针对图像的处理。我们提出DRCFS——一种即使在非线性和高维场景下也能识别因果特征的双重稳健特征选择方法。我们提供了理论保证,阐述了假设成立的必要条件,并在广泛的模拟数据集与半合成数据集上进行了大量实验。即使面对具有挑战性的高度非线性和高维问题,DRCFS仍能显著优于现有最先进方法,选出稳健的特征。