Selecting important features that have substantial effects on the response with provable type-I error rate control is a fundamental concern in statistics, with wide-ranging practical applications. Existing knockoff filters, although shown to provide theoretical guarantee on false discovery rate (FDR) control, often struggle to strike a balance between high power and precision in pinpointing important features when there exist large groups of strongly correlated features. To address this challenge, we develop a new filter using group knockoffs to achieve both powerful and precise selection of important features. Via experiments of simulated data and analysis of a real Alzheimer's disease genetic dataset, it is found that the proposed filter can not only control the proportion of false discoveries but also identify important features with comparable power and greater precision than the existing group knockoffs filter.
翻译:选择对响应变量具有显著影响的重要特征,并在可证明的第一类错误率控制下进行筛选,是统计学中的一个基本问题,具有广泛的实际应用。现有的敲除过滤器虽然被证明能在理论上保证错误发现率(FDR)控制,但在存在大量强相关特征组时,往往难以在高功效与精准识别重要特征之间取得平衡。为应对这一挑战,我们开发了一种基于组敲除的新过滤器,以实现对重要特征既强大又精准的选择。通过模拟数据实验和对真实阿尔茨海默病遗传数据集的分析,发现所提出的过滤器不仅能控制错误发现比例,还能以与现有组敲除过滤器相当的功效和更高的精准度识别重要特征。