Controlling the False Discovery Rate (FDR) in a variable selection procedure is critical for reproducible discoveries, and it has been extensively studied in sparse linear models. However, it remains largely open in scenarios where the sparsity constraint is not directly imposed on the parameters but on a linear transformation of the parameters to be estimated. Examples of such scenarios include total variations, wavelet transforms, fused LASSO, and trend filtering. In this paper, we propose a data-adaptive FDR control method, called the Split Knockoff method, for this transformational sparsity setting. The proposed method exploits both variable and data splitting. The linear transformation constraint is relaxed to its Euclidean proximity in a lifted parameter space, which yields an orthogonal design that enables the orthogonal Split Knockoff construction. To overcome the challenge that exchangeability fails due to the heterogeneous noise brought by the transformation, new inverse supermartingale structures are developed via data splitting for provable FDR control without sacrificing power. Simulation experiments demonstrate that the proposed methodology achieves the desired FDR and power. We also provide an application to Alzheimer's Disease study, where atrophy brain regions and their abnormal connections can be discovered based on a structural Magnetic Resonance Imaging dataset (ADNI).
翻译:在变量选择过程中控制错误发现率对于可重复性发现至关重要,这一课题在线性稀疏模型中已有广泛研究。然而,当稀疏性约束并非直接施加于参数本身,而是施加于待估计参数的线性变换时(例如全变分、小波变换、融合LASSO与趋势滤波等场景),该问题仍面临重大开放性挑战。本文针对此类变换稀疏性场景,提出一种数据自适应型FDR控制方法——分裂敲门法。该方法综合运用变量分裂与数据分裂技术,通过在升维参数空间中放松线性变换约束到其欧几里得邻近域,构建正交化设计以支撑正交分裂门构造。为克服变换引入的异质噪声导致可交换性失效的难题,本文借助数据分裂开发了新型逆超鞅结构,在不损失检验功效的前提下实现可证明的FDR控制。仿真实验表明,所提方法能同时达到预期的FDR与统计功效。我们还将该方法应用于阿尔茨海默病研究,基于结构磁共振成像数据集(ADNI)发现脑区萎缩区域及其异常连接。