We consider problems where many, somewhat redundant, hypotheses are tested and we are interested in reporting the most precise rejections, with false discovery rate (FDR) control. For example, a common goal in genetics is to identify DNA variants that carry distinct information on a trait of interest. However, strong local dependencies between nearby variants make it challenging to distinguish which of the many correlated features most directly influence the phenotype. A common solution is then to identify sets of variants that cover the truly important ones. Depending on the signal strengths, it is possible to resolve the individual variant contributions with more or less precision. Assuring FDR control on the reported findings with these adaptive searches is, however, often impossible. To design a multiple comparison procedure that allows for an adaptive choice of resolution with FDR control, we leverage e-values and linear programming. We adapt this approach to problems where knockoffs and group knockoffs have been successfully applied to test conditional independence hypotheses. We demonstrate its efficacy by analyzing data from the UK Biobank.
翻译:我们考虑这样的问题:在检验大量彼此有些冗余的假设时,目标是报告最精确的拒绝结果,同时控制错误发现率(FDR)。例如,遗传学中的一个常见任务是在DNA变异中识别那些携带关于某个性状独特信息的位点。然而,相邻变异之间的强局部依赖性使得很难区分众多相关特征中哪些对表型影响最直接。一种常见方案是识别出覆盖真正重要位点的变异集合。根据信号强度,对单个变异贡献的解析精度可高可低。但在这类自适应搜索中确保所报告发现结果的FDR控制通常是无法实现的。为了设计一种能够自适应选择分辨率且同时控制FDR的多重比较程序,我们利用e值和线性规划。我们将该方法改编应用于那些已成功采用knockoff和组knockoff检验条件独立假设的问题。通过分析英国生物银行(UK Biobank)的数据,我们展示了该方法的有效性。