Consider a scenario where a large number of explanatory features targeting a response variable are analyzed, such that these features are partitioned into different groups according to their domain-specific structures. Furthermore, there may be several such partitions. Such multiple partitions may exist in many real-life scenarios. One such example is spatial genome-wide association studies. Researchers may not only be interested in identifying the features relevant to the response but also aim to determine the relevant groups within each partition. A group is considered relevant if it contains at least one relevant feature. To ensure the replicability of the findings at various resolutions, it is essential to provide false discovery rate (FDR) control for findings at multiple layers simultaneously. This paper presents a general approach that leverages various existing controlled selection procedures to generate more stable results using multilayer FDR control. The key contributions of our proposal are the development of a generalized e-filter that provides multilayer FDR control and the construction of a specific type of generalized e-values to evaluate feature importance. A primary application of our method is an improved version of Data Splitting (DS), called the eDS-filter. Furthermore, we combine the eDS-filter with the version of the group knockoff filter (gKF), resulting in a more flexible approach called the eDS+gKF filter. Simulation studies demonstrate that the proposed methods effectively control the FDR at multiple levels while maintaining or even improving power compared to other approaches. Finally, we apply the proposed method to analyze HIV mutation data.
翻译:考虑这样一种场景:针对响应变量分析大量解释性特征,这些特征根据其领域特定结构被划分为不同组别。此外,可能存在多个这样的划分方式。这种多重划分在现实场景中普遍存在,空间全基因组关联研究即为一例。研究者不仅关注识别与响应相关的特征,还致力于确定每个划分中相关的组别——若某组包含至少一个相关特征,则被视为相关组。为确保研究结果在不同分辨率下的可复现性,必须同时对多层发现结果提供错误发现率控制。本文提出一种通用方法,通过整合多种现有受控选择程序,利用多层FDR控制生成更稳定的结果。本方案的核心贡献在于:开发了提供多层FDR控制的广义e值过滤器,并构建了特定类型的广义e值以评估特征重要性。本方法的主要应用是数据分割法的改进版本——eDS过滤器。进一步地,我们将eDS过滤器与组敲除过滤器版本相结合,形成更灵活的eDS+gKF过滤器。模拟研究表明,相较于其他方法,所提方法能在有效控制多层FDR的同时保持甚至提升统计功效。最后,我们将所提方法应用于HIV突变数据分析。