Unsupervised learning has become a staple in classical machine learning, successfully identifying clustering patterns in data across a broad range of domain applications. Surprisingly, despite its accuracy and elegant simplicity, unsupervised learning has not been sufficiently exploited in the realm of phylogenetic tree inference. The main reason for the delay in adoption of unsupervised learning in phylogenetics is the lack of a meaningful, yet simple, way of embedding phylogenetic trees into a vector space. Here, we propose the simple yet powerful split-weight embedding which allows us to fit standard clustering algorithms to the space of phylogenetic trees. We show that our split-weight embedded clustering is able to recover meaningful evolutionary relationships in simulated and real (Adansonia baobabs) data.
翻译:无监督学习已成为经典机器学习中的基石,成功地在广泛领域应用中识别出数据中的聚类模式。令人惊讶的是,尽管无监督学习具有准确性和简洁优雅的特点,但在系统发育树推断领域尚未得到充分开发。导致无监督学习在系统发育学中应用滞后的主要原因是缺乏一种有意义且简单的方法,将系统发育树嵌入到向量空间中。在此,我们提出了一种简单而强大的分裂权重嵌入方法,该方法使我们能够将标准聚类算法拟合到系统发育树空间中。我们证明,基于分裂权重嵌入的聚类方法能够在模拟数据和真实数据(猴面包树)中恢复有意义的进化关系。