We consider problems where many, somewhat redundant, hypotheses are tested and we are interested in reporting the most precise rejections, with false discovery rate (FDR) control. This is the case, for example, when researchers are interested both in individual hypotheses as well as group hypotheses corresponding to intersections of sets of the original hypotheses, at several resolution levels. A concrete application is in genome-wide association studies, where, depending on the signal strengths, it might be possible to resolve the influence of individual genetic variants on a phenotype with greater or lower precision. To adapt to the unknown signal strength, analyses are conducted at multiple resolutions and researchers are most interested in the more precise discoveries. Assuring FDR control on the reported findings with these adaptive searches is, however, often impossible. To design a multiple comparison procedure that allows for an adaptive choice of resolution with FDR control, we leverage e-values and linear programming. We adapt this approach to problems where knockoffs and group knockoffs have been successfully applied to test conditional independence hypotheses. We demonstrate its efficacy by analyzing data from the UK Biobank.
翻译:我们考虑在检验大量且存在一定程度冗余的假设时,如何以错误发现率(FDR)控制为目标,报告最精确的拒绝结论。此类问题常见于研究者同时关注原始假设及多个分辨率层级下对应集合交集构成的组假设的场景。一个具体应用是全基因组关联研究——根据信号强度差异,研究者可能以不同精度解析单个遗传变异对表型的影响。为适应未知的信号强度,通常需要在多分辨率下进行分析,而研究者最关注的是更精确的发现。然而,在这种自适应搜索中确保报告发现的FDR控制往往难以实现。为设计一种允许自适应选择分辨率且能控制FDR的多重比较程序,我们利用e值和线性规划方法。我们将该方案应用于已成功采用敲除和组敲除方法检验条件独立性假设的问题。通过分析英国生物银行数据,我们验证了该方法的有效性。