Evolutionary accumulation models (EvAMs) are an emerging class of machine learning methods designed to infer the evolutionary pathways by which features are acquired. Applications include cancer evolution (accumulation of mutations), anti-microbial resistance (accumulation of drug resistances), genome evolution (organelle gene transfers), and more diverse themes in biology and beyond. Following these themes, many EvAMs assume that features are gained irreversibly -- no loss of features can occur. Reversible approaches do exist but are often computationally (much) more demanding and statistically less stable. Our goal here is to explore whether useful information about evolutionary dynamics which are in reality reversible can be obtained from modelling approaches with an assumption of irreversibility. We identify, and use simulation studies to quantify, errors involved in neglecting reversible dynamics, and show the situations in which approximate results from tractable models can be informative and reliable. In particular, EvAM inferences about the relative orderings of acquisitions, and the core dynamic structure of evolutionary pathways, are robust to reversibility in many cases, while estimations of uncertainty and feature interactions are more error-prone.
翻译:进化累积模型是一类新兴的机器学习方法,旨在推断特征获得的进化路径。其应用领域包括癌症进化(突变累积)、抗菌素耐药性(耐药性累积)、基因组进化(细胞器基因转移)以及生物学内外更多样化的主题。基于这些主题,许多进化累积模型假设特征的获得是不可逆的——特征不会发生丢失。可逆方法确实存在,但通常在计算上(显著)要求更高且统计稳定性较差。本文的目标是探讨:在假设不可逆性的建模方法中,是否能够获取关于实际可逆的进化动态的有用信息。我们识别并通过模拟研究量化了忽略可逆动态所涉及的误差,展示了在何种情况下,来自易处理模型的近似结果能够提供信息且可靠。具体而言,在许多情况下,进化累积模型关于获得事件的相对顺序以及进化路径核心动态结构的推断对可逆性具有鲁棒性,而不确定性估计和特征间相互作用的评估则更容易产生误差。