Second-order group knockoffs with applications to GWAS

Conditional testing via the knockoff framework allows one to identify -- among large number of possible explanatory variables -- those that carry unique information about an outcome of interest, and also provides a false discovery rate guarantee on the selection. This approach is particularly well suited to the analysis of genome wide association studies (GWAS), which have the goal of identifying genetic variants which influence traits of medical relevance. While conditional testing can be both more powerful and precise than traditional GWAS analysis methods, its vanilla implementation encounters a difficulty common to all multivariate analysis methods: it is challenging to distinguish among multiple, highly correlated regressors. This impasse can be overcome by shifting the object of inference from single variables to groups of correlated variables. To achieve this, it is necessary to construct "group knockoffs." While successful examples are already documented in the literature, this paper substantially expands the set of algorithms and software for group knockoffs. We focus in particular on second-order knockoffs, for which we describe correlation matrix approximations that are appropriate for GWAS data and that result in considerable computational savings. We illustrate the effectiveness of the proposed methods with simulations and with the analysis of albuminuria data from the UK Biobank. The described algorithms are implemented in an open-source Julia package Knockoffs.jl, for which both R and Python wrappers are available.

翻译：通过敲除框架的条件检验方法，能够在大量解释变量中识别出对目标结果携带独特信息的变量，并为筛选结果提供错误发现率保证。该方法尤其适用于全基因组关联研究（GWAS）的分析——这类研究旨在识别影响医学相关性状的遗传变异。虽然条件检验比传统GWAS分析方法更具统计功效和精准度，但其原始实现面临所有多变量分析方法共有的难题：难以区分多个高度相关的回归变量。这一困境可通过将推断对象从单一变量转向相关变量群组来解决。为此，需要构建"群组敲除"。尽管文献中已有成功案例，本文系统扩展了群组敲除的算法与软件体系。我们特别关注二阶敲除方法，描述了适用于GWAS数据的相关矩阵近似技术，该方法能显著降低计算成本。通过模拟实验以及英国生物银行白蛋白尿数据的分析，我们验证了所提方法的有效性。文中描述的算法已开源实现于Julia语言包Knockoffs.jl中，并提供了R语言与Python语言的调用接口。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

（CVPR2021）基于结构保持的弱监督目标定位

专知会员服务

21+阅读 · 2021年5月1日