Evolutionary accumulation models (EvAMs) are an emerging class of machine learning methods designed to infer the evolutionary pathways by which features are acquired. Applications include cancer evolution (accumulation of mutations), anti-microbial resistance (accumulation of drug resistances), genome evolution (organelle gene transfers), and more diverse themes in biology and beyond. Following these themes, many EvAMs assume that features are gained irreversibly -- no loss of features can occur. Reversible approaches do exist but are often computationally (much) more demanding and statistically less stable. Our goal here is to explore whether useful information about evolutionary dynamics which are in reality reversible can be obtained from modelling approaches with an assumption of irreversibility. We identify, and use simulation studies to quantify, errors involved in neglecting reversible dynamics, and show the situations in which approximate results from tractable models can be informative and reliable. In particular, EvAM inferences about the relative orderings of acquisitions and the core dynamic structure of evolutionary pathways -- which features are likely present when another is acquired -- are robust to reversibility in many cases, while estimations of uncertainty and feature interactions are more error-prone.
翻译:进化累积模型(EvAMs)是一类新兴的机器学习方法,旨在推断特征获取的进化路径。其应用领域包括癌症进化(突变累积)、抗菌素耐药性(耐药性累积)、基因组进化(细胞器基因转移)以及生物学内外更多样化的主题。基于这些主题,许多EvAMs假设特征的获得是不可逆的——特征不会发生丢失。可逆方法确实存在,但通常在计算上(显著)要求更高且统计稳定性较差。本文的目标是探讨,通过采用不可逆性假设的建模方法,是否能够获取关于实际可逆的进化动力学的有用信息。我们识别并通过模拟研究量化了忽略可逆动力学所涉及的误差,并展示了在何种情况下,来自易处理模型的近似结果可以提供信息且可靠。具体而言,在许多情况下,EvAMs关于获取事件的相对顺序以及进化路径的核心动态结构——即当另一个特征被获取时,哪些特征可能已经存在——的推断对可逆性具有鲁棒性,而不确定性估计和特征间相互作用的评估则更容易出错。