We investigate the robustness of the model-X knockoffs framework with respect to the misspecified or estimated feature distribution. We achieve such a goal by theoretically studying the feature selection performance of a practically implemented knockoffs algorithm, which we name as the approximate knockoffs (ARK) procedure, under the measures of the false discovery rate (FDR) and family wise error rate (FWER). The approximate knockoffs procedure differs from the model-X knockoffs procedure only in that the former uses the misspecified or estimated feature distribution. A key technique in our theoretical analyses is to couple the approximate knockoffs procedure with the model-X knockoffs procedure so that random variables in these two procedures can be close in realizations. We prove that if such coupled model-X knockoffs procedure exists, the approximate knockoffs procedure can achieve the asymptotic FDR or FWER control at the target level. We showcase three specific constructions of such coupled model-X knockoff variables, verifying their existence and justifying the robustness of the model-X knockoffs framework.
翻译:本文研究了模型X-knockoffs框架在特征分布误设定或估计情形下的稳健性。通过理论分析一种实际应用的knockoffs算法(我们称之为近似knockoffs (ARK) 流程)在错误发现率 (FDR) 和族系误差率 (FWER) 指标下的特征选择性能,我们实现了上述目标。近似knockoffs流程与模型X-knockoffs流程的唯一区别在于前者使用误设定或估计的特征分布。我们理论分析的关键技术在于构建近似knockoffs流程与模型X-knockoffs流程之间的耦合,使得两个流程中的随机变量在实现上接近。我们证明:若此类耦合的模型X-knockoffs流程存在,则近似knockoffs流程能够在目标水平上实现渐近FDR或FWER控制。我们展示了三种具体的耦合模型X-knockoff变量构造方法,验证了它们的存在性,并论证了模型X-knockoffs框架的稳健性。