We propose one-at-a-time knockoffs (OATK), a new methodology for detecting important explanatory variables in linear regression models while controlling the false discovery rate (FDR). For each explanatory variable, OATK generates a knockoff design matrix that preserves the Gram matrix by replacing one-at-a-time only the single corresponding column of the original design matrix. OATK is a substantial relaxation and simplification of the knockoff filter by Barber and Cand\`es (BC), which simultaneously generates all columns of the knockoff design matrix to satisfy a much larger set of constraints. To test each variable's importance, statistics are then constructed by comparing the original vs. knockoff coefficients. Under a mild correlation assumption on the original design matrix, OATK asymptotically controls the FDR at any desired level. Moreover, OATK consistently achieves (often substantially) higher power than BC and other approaches across a variety of simulation examples and a real genetics dataset. Generating knockoffs one-at-a-time also has substantial computational advantages and facilitates additional enhancements, such as conditional calibration or derandomization, to further improve power and consistency of FDR control. OATK can be viewed as the conditional randomization test (CRT) generalized to fixed-design linear regression problems, and can generate fine-grained p-values for each hypothesis.
翻译:我们提出了一种名为逐个生成敲除变量(OATK)的新方法,用于在线性回归模型中检测重要解释变量,同时控制错误发现率(FDR)。对于每个解释变量,OATK通过每次仅替换原始设计矩阵中对应单列的方式生成敲除设计矩阵,同时保持格拉姆矩阵不变。OATK是Barber与Candès(BC)提出的敲除滤波器的显著松弛和简化版本——后者需要同时生成敲除设计矩阵的所有列以满足更庞大的约束条件。为检验每个变量的重要性,通过比较原始系数与敲除系数构建统计量。在原始设计矩阵满足温和相关性假设的条件下,OATK能在任意期望水平上渐近控制FDR。此外,在多种模拟示例和真实遗传学数据集的测试中,OATK始终展现出(通常显著)高于BC方法及其他对比方法的功效。逐个生成敲除变量的策略还具有显著计算优势,并能结合条件校准或去随机化等增强技术,进一步提升功效与FDR控制的稳定性。OATK可视为条件随机化检验(CRT)在固定设计线性回归问题中的推广,且能为每个假设生成细粒度的p值。