The Informed Elastic Net for Fast Grouped Variable Selection and FDR Control in Genomics Research

from arxiv, Published in IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), 10-13 December 2023, Los Sue\~nos, Costa Rica

Modern genomics research relies on genome-wide association studies (GWAS) to identify the few genetic variants among potentially millions that are associated with diseases of interest. Only reproducible discoveries of groups of associations improve our understanding of complex polygenic diseases and enable the development of new drugs and personalized medicine. Thus, fast multivariate variable selection methods that have a high true positive rate (TPR) while controlling the false discovery rate (FDR) are crucial. Recently, the T-Rex+GVS selector, a version of the T-Rex selector that uses the elastic net (EN) as a base selector to perform grouped variable election, was proposed. Although it significantly increased the TPR in simulated GWAS compared to the original T-Rex, its comparably high computational cost limits scalability. Therefore, we propose the informed elastic net (IEN), a new base selector that significantly reduces computation time while retaining the grouped variable selection property. We quantify its grouping effect and derive its formulation as a Lasso-type optimization problem, which is solved efficiently within the T-Rex framework by the terminated LARS algorithm. Numerical simulations and a GWAS study demonstrate that the proposed T-Rex+GVS (IEN) exhibits the desired grouping effect, reduces computation time, and achieves the same TPR as T-Rex+GVS (EN) but with lower FDR, which makes it a promising method for large-scale GWAS.

翻译：现代基因组学研究依赖全基因组关联研究（GWAS）从数百万潜在遗传变异中识别出与目标疾病相关的少数变异。唯有可复现的关联分组发现才能增进我们对复杂多基因疾病的理解，并推动新药研发与个性化医疗的发展。因此，需要兼具高真阳性率（TPR）与可控错误发现率（FDR）的快速多元变量选择方法。近期提出的T-Rex+GVS选择器（采用弹性网络作为基础选择器执行分组变量选择的T-Rex改进版本）在模拟GWAS中较原始T-Rex显著提升了TPR，但其较高的计算成本限制了可扩展性。为此，我们提出知情弹性网络（IEN）——一种在保留分组变量选择特性的同时显著减少计算时间的新型基础选择器。我们量化了其分组效应，并将其推导为Lasso型优化问题，通过终止LARS算法在T-Rex框架内高效求解。数值模拟与GWAS研究表明，所提出的T-Rex+GVS（IEN）具备理想的分组效应，在保持与T-Rex+GVS（EN）相同TPR的同时降低了FDR，且显著缩短计算时间，这使其成为大规模GWAS研究中具有前景的新方法。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日