Model-X knockoff has garnered significant attention among various feature selection methods due to its guarantees for controlling the false discovery rate (FDR). Since its introduction in parametric design, knockoff techniques have evolved to handle arbitrary data distributions using deep learning-based generative models. However, we have observed limitations in the current implementations of the deep Model-X knockoff framework. Notably, the "swap property" that knockoffs require often faces challenges at the sample level, resulting in diminished selection power. To address these issues, we develop "Deep Dependency Regularized Knockoff (DeepDRK)," a distribution-free deep learning method that effectively balances FDR and power. In DeepDRK, we introduce a novel formulation of the knockoff model as a learning problem under multi-source adversarial attacks. By employing an innovative perturbation technique, we achieve lower FDR and higher power. Our model outperforms existing benchmarks across synthetic, semi-synthetic, and real-world datasets, particularly when sample sizes are small and data distributions are non-Gaussian.
翻译:Model-X knockoff方法因其对错误发现率(FDR)的控制保证,在众多特征选择方法中获得了广泛关注。自其在参数化设计中提出以来,knockoff技术已发展为利用基于深度学习的生成模型处理任意数据分布。然而,我们观察到当前深度Model-X knockoff框架的实现存在局限性。值得注意的是,knockoff所要求的“交换性质”在样本层面常面临挑战,导致选择效能降低。为解决这些问题,我们提出了“深度依赖正则化Knockoff(DeepDRK)”,这是一种无需预设分布且能有效平衡FDR与效能的深度学习方法。在DeepDRK中,我们提出了一种新颖的knockoff模型构建方式,将其视为多源对抗攻击下的学习问题。通过采用创新的扰动技术,我们实现了更低的FDR和更高的效能。我们的模型在合成、半合成及真实数据集上均优于现有基准方法,尤其在样本量较小且数据分布为非高斯的情况下表现更为突出。