We demonstrate that the assembly pathway method underlying Assembly Theory (AT) is a dictionary-based encoding scheme for `counting copies', widely used by popular statistical compression algorithms, some of which have been applied to many areas, including systems biology, chemistry, and biosignature classification. We show that AT performs similarly in all cases (synthetic or natural) to other simple coding schemes and underperforms compared to system-related indexes based upon algorithmic probability that take into account the likelihood of related computable approximations of similar events. Our results also demonstrate that simple (and tractably computable) modular instructions can mislead AT, leading to failure in practice in capturing the properties of physical systems. These theoretical and empirical results imply that the assembly index, whose computable nature is not an advantage, does not offer substantial improvements over existing methods. In contrast, other resource-bounded (therefore also computable) indexes that approximate algorithmic (Kolmogorov) complexity show the ability to separate organic from inorganic molecules and even perform better on the mass spectral information used by the authors of AT. We show the predictive power of these other system-driven indexes based on their solid foundations and empirical results.
翻译:我们证明,装配理论(AT)所依赖的装配路径方法是一种基于词典的“副本计数”编码方案,该方案已被广泛用于流行的统计压缩算法,其中一些算法已应用于系统生物学、化学及生物特征分类等多个领域。研究表明,AT在所有情形(无论是合成还是自然场景)中的表现与其他简单编码方案相似,且不及基于算法概率的系统相关指标——这些指标考虑了类似事件的可计算近似值的可能性。我们的结果还表明,简单且可计算处理的模块化指令可能误导AT,导致其在实际应用中无法有效捕捉物理系统的特性。这些理论与实证结果意味着,装配指数(其可计算性并非优势)并未在现有方法基础上提供实质性改进。相比之下,其他资源受限(因此同样可计算)且近似算法(柯尔莫哥洛夫)复杂度的指标,能够区分有机分子与无机分子,甚至对AT作者所用的质谱信息表现出更优性能。我们基于这些系统驱动指标的理论基础与实证结果,展示了其预测能力。