Model-X knockoffs is a flexible wrapper method for high-dimensional regression algorithms, which provides guaranteed control of the false discovery rate (FDR). Due to the randomness inherent to the method, different runs of model-X knockoffs on the same dataset often result in different sets of selected variables, which is undesirable in practice. In this paper, we introduce a methodology for derandomizing model-X knockoffs with provable FDR control. The key insight of our proposed method lies in the discovery that the knockoffs procedure is in essence an e-BH procedure. We make use of this connection, and derandomize model-X knockoffs by aggregating the e-values resulting from multiple knockoff realizations. We prove that the derandomized procedure controls the FDR at the desired level, without any additional conditions (in contrast, previously proposed methods for derandomization are not able to guarantee FDR control). The proposed method is evaluated with numerical experiments, where we find that the derandomized procedure achieves comparable power and dramatically decreased selection variability when compared with model-X knockoffs.
翻译:模型X knockoffs是一种灵活的高维回归算法包装方法,能够保证对错误发现率(FDR)进行控制。由于该方法固有的随机性,对同一数据集多次运行模型X knockoffs通常会产生不同的变量选择结果,这在实际应用中并不理想。本文提出了一种具有可证明FDR控制的模型X knockoffs去随机化方法。本方法的关键发现在于,knockoffs过程本质上是一种e-BH过程。我们利用这一联系,通过聚合多次knockoff实现产生的e值来对模型X knockoffs进行去随机化。我们证明了去随机化过程能够在无额外条件的情况下将FDR控制在期望水平(相比之下,先前提出的去随机化方法无法保证FDR控制)。通过数值实验评估了所提方法,我们发现与模型X knockoffs相比,去随机化过程具有相当的统计功效,并显著降低了选择变异性。