Selecting important features that have substantial effects on the response with provable type-I error rate control is a fundamental concern in statistics, with wide-ranging practical applications. Existing knockoff filters, although shown to provide theoretical guarantee on false discovery rate (FDR) control, often struggle to strike a balance between high power and precision in pinpointing important features when there exist large groups of strongly correlated features. To address this challenge, we develop a new filter using group knockoffs to achieve both powerful and precise selection of important features. Via experiments of simulated data and analysis of a real Alzheimer's disease genetic dataset, it is found that the proposed filter can not only control the proportion of false discoveries but also identify important features with comparable power and greater precision than the existing group knockoffs filter.
翻译:选取对响应变量有显著影响的重要特征,同时保证可控的I型错误率,是统计学中一个基础问题,具有广泛的实践应用。现有的敲除过滤器虽已被证明能在理论上保证错误发现率(FDR)控制,但在存在大量强相关特征组的情况下,往往难以在高检验效能与精确定位重要特征之间取得平衡。为解决这一挑战,我们开发了一种基于组敲除的新过滤器,以实现对重要特征既强效又精确的选择。通过模拟数据实验以及对真实阿尔茨海默病遗传数据集的分析,发现所提出的过滤器不仅能控制错误发现的比例,还能以与现有组敲除过滤器相当的检验效能和更高的精度识别重要特征。