We demonstrate that the assembly pathway method underlying Assembly Theory (AT) is a dictionary-based encoding scheme for `counting copies', widely used by popular statistical compression algorithms some of which have been applied to many areas, including systems biology, chemistry and biosignature classification. We show that AT performs similarly in all cases (synthetic or natural) to other simple coding schemes and underperforms compared to system-related indexes based upon algorithmic probability that take into account the likelihood of related computable approximations of similar events. Our results also demonstrate that simple (and tractably computable) modular instructions can mislead AT, leading to failure in practice in capturing properties of physical systems. These theoretical and empirical results imply that the assembly index, whose computable nature is not an advantage, does not offer substantial improvements over existing methods. In contrast, other resource-bounded (therefore also computable) indexes that approximate algorithmic (Kolmogorov) complexity show the ability to separate organic from inorganic molecules and even perform better on the mass spectral information used by the authors of AT. We show the predictive power of these other system-driven indexes based on their solid foundations and empirical results.
翻译:我们证明,支撑组装理论(AT)的组装路径方法本质上是一种基于字典的“副本计数”编码方案,该方案广泛应用于流行的统计压缩算法中,部分算法已被应用于系统生物学、化学及生物标志物分类等多个领域。我们表明,在所有情况下(合成或自然),AT的表现与其他简单编码方案相似,且其性能逊于基于算法概率的系统相关指标——这些指标考虑了类似事件相关可计算近似的可能性。我们的结果还表明,简单(且易于计算)的模块化指令可能误导AT,导致其在实际中无法捕捉物理系统的特性。这些理论与实证结果意味着,组装指数(其可计算性并非优势)并未在现有方法基础上提供实质性改进。相比之下,其他受资源限制(因而同样可计算)且近似算法(科尔莫戈洛夫)复杂度的指标,不仅能够区分有机与无机分子,甚至能在AT作者使用的质谱信息上表现更优。我们展示了这些基于坚实理论基础与实证结果的系统驱动指标的预测能力。