Controlling the false discovery rate (FDR) in high-dimensional variable selection requires balancing rigorous error control with statistical power. Existing methods with provable guarantees are often overly conservative, creating a persistent gap between the realized false discovery proportion (FDP) and the target FDR level. We introduce a learning-augmented enhancement of the T-Rex Selector framework that narrows this gap. Our approach replaces the analytical FDP estimator with a neural network trained solely on diverse synthetic datasets, enabling a substantially tighter and more accurate approximation of the FDP. This refinement allows the procedure to operate much closer to the desired FDR level, thereby increasing discovery power while maintaining effective approximate control. Through extensive simulations and a challenging synthetic genome-wide association study (GWAS), we demonstrate that our method achieves superior detection of true variables compared to existing approaches.
翻译:在高维变量选择中控制错误发现率(FDR)需要在严格的错误控制与统计功效之间取得平衡。现有具备可证明保证的方法通常过于保守,导致实际错误发现比例(FDP)与目标FDR水平之间存在持续差距。我们提出一种学习增强型的T-Rex Selector框架改进方案,以缩小这一差距。该方法用神经网络替代解析型FDP估计器,该网络仅通过多样化合成数据集进行训练,从而能够实现更紧密且更精确的FDP近似。这种改进使控制程序能在更接近目标FDR的水平运行,在保持有效近似控制的同时显著提升发现功效。通过大量模拟实验和一项具有挑战性的合成全基因组关联研究(GWAS),我们证明相较于现有方法,本方案能实现对真实变量的更优检测。