Many-core algorithms for high-dimensional gradients on phylogenetic trees

The rapid growth in genomic pathogen data spurs the need for efficient inference techniques, such as Hamiltonian Monte Carlo (HMC) in a Bayesian framework, to estimate parameters of these phylogenetic models where the dimensions of the parameters increase with the number of sequences $N$. HMC requires repeated calculation of the gradient of the data log-likelihood with respect to (wrt) all branch-length-specific (BLS) parameters that traditionally takes $\mathcal{O}(N^2)$ operations using the standard pruning algorithm. A recent study proposes an approach to calculate this gradient in $\mathcal{O}(N)$, enabling researchers to take advantage of gradient-based samplers such as HMC. The CPU implementation of this approach makes the calculation of the gradient computationally tractable for nucleotide-based models but falls short in performance for larger state-space size models, such as codon models. Here, we describe novel massively parallel algorithms to calculate the gradient of the log-likelihood wrt all BLS parameters that take advantage of graphics processing units (GPUs) and result in many fold higher speedups over previous CPU implementations. We benchmark these GPU algorithms on three computing systems using three evolutionary inference examples: carnivores, dengue and yeast, and observe a greater than 128-fold speedup over the CPU implementation for codon-based models and greater than 8-fold speedup for nucleotide-based models. As a practical demonstration, we also estimate the timing of the first introduction of West Nile virus into the continental Unites States under a codon model with a relaxed molecular clock from 104 full viral genomes, an inference task previously intractable. We provide an implementation of our GPU algorithms in BEAGLE v4.0.0, an open source library for statistical phylogenetics that enables parallel calculations on multi-core CPUs and GPUs.

翻译：基因组病原体数据的快速增长促使我们需要高效的推断技术，例如贝叶斯框架下的哈密顿蒙特卡洛（HMC），以估计这些系统发育模型的参数，其中参数的维度随着序列数量 $N$ 的增加而增加。HMC 需要反复计算数据对数似然相对于所有分支长度特定（BLS）参数的梯度，传统上使用标准修剪算法需要 $\mathcal{O}(N^2)$ 次操作。最近的一项研究提出了一种在 $\mathcal{O}(N)$ 时间内计算该梯度的方法，使研究者能够利用基于梯度的采样器（如 HMC）的优势。该方法的 CPU 实现使得基于核苷酸的模型在计算梯度时变得可行，但对于状态空间规模更大的模型（如密码子模型）性能不足。本文描述了一种新颖的大规模并行算法，用于计算对数似然相对于所有 BLS 参数的梯度，该算法利用图形处理单元（GPU），相比之前的 CPU 实现实现了数倍的加速。我们在三个计算系统上使用三个进化推断示例（食肉动物、登革热和酵母）对这些 GPU 算法进行了基准测试，观察到对于基于密码子的模型，加速比超过 CPU 实现的 128 倍，对于基于核苷酸的模型，加速比超过 8 倍。作为实际展示，我们还使用带有松弛分子钟的密码子模型，从 104 个完整的病毒基因组中估计了西尼罗河病毒首次传入美国大陆的时间，这是一个以前难以处理的推断任务。我们在 BEAGLE v4.0.0 中提供了 GPU 算法的实现，这是一个用于统计系统发育学的开源库，支持在多核 CPU 和 GPU 上进行并行计算。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

66+阅读 · 2023年2月15日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日