Phylogenetics is now fundamental in life sciences, providing insights into the earliest branches of life and the origins and spread of epidemics. However, finding suitable phylogenies from the vast space of possible trees remains challenging. To address this problem, for the first time, we perform both tree exploration and inference in a continuous space where the computation of gradients is possible. This continuous relaxation allows for major leaps across tree space in both rooted and unrooted trees, and is less susceptible to convergence to local minima. Our approach outperforms the current best methods for inference on unrooted trees and, in simulation, accurately infers the tree and root in ultrametric cases. The approach is effective in cases of empirical data with negligible amounts of data, which we demonstrate on the phylogeny of jawed vertebrates. Indeed, only a few genes with an ultrametric signal were generally sufficient for resolving the major lineages of vertebrates. Optimisation is possible via automatic differentiation and our method presents an effective way forwards for exploring the most difficult, data-deficient phylogenetic questions.
翻译:系统发育学现已成为生命科学的基础,为生命早期分支及流行病起源与传播提供了深刻见解。然而,在浩瀚的潜在树形空间中寻找合适的系统发育树仍具挑战。为解决此问题,我们首次在可计算梯度的连续空间中同时实现树的探索与推断。这种连续松弛方法允许在树空间中进行重大跨越(涵盖有根树与无根树),且不易陷入局部最优。我们的方法在无根树推断任务上超越当前最优方法,并在模拟实验中准确推断超度量情形下的树形与根节点。该方法在数据量极少的实证数据中同样有效——我们以有颌脊椎动物系统发育为例证明:仅需少数具有超度量信号的基因,通常就足以解析脊椎动物主要支系。通过自动微分可实现优化,本方法为探索最困难、数据匮乏的系统发育问题提供了有效途径。