Unsupervised learning has become a staple in classical machine learning, successfully identifying clustering patterns in data across a broad range of domain applications. Surprisingly, despite its accuracy and elegant simplicity, unsupervised learning has not been sufficiently exploited in the realm of phylogenetic tree inference. The main reason for the delay in adoption of unsupervised learning in phylogenetics is the lack of a meaningful, yet simple, way of embedding phylogenetic trees into a vector space. Here, we propose the simple yet powerful split-weight embedding which allows us to fit standard clustering algorithms to the space of phylogenetic trees. We show that our split-weight embedded clustering is able to recover meaningful evolutionary relationships in simulated and real (Adansonia baobabs) data.
翻译:无监督学习已成为经典机器学习中的重要方法,能够跨广泛领域应用成功识别数据中的聚类模式。令人惊讶的是,尽管该方法在系统发育推断领域具有准确性和简洁性,但其尚未得到充分开发。无监督学习在系统发育学中应用滞后的主要原因是缺乏将系统发育树嵌入向量空间的有意义且简单的方法。本文提出了一种简单而强大的分裂权重嵌入方法,使我们能够将标准聚类算法应用于系统发育树空间。实验表明,基于分裂权重嵌入的聚类方法能够在模拟数据和真实数据(猴面包树属)中恢复有意义的进化关系。