Qualitative models provide crucial instruments for modelling complex biological systems. While advances in automated reasoning and symbolic encodings have enabled rigorous inference of these models from data, the process remains highly fragile. First, biological measurement errors inevitably propagate into formal model specifications. Second, when a specification becomes unsatisfiable, distinguishing between fundamental design flaws and minor technical errors is notoriously difficult. This uncertainty often leads to under-specification, as it is unclear which observations are still ``safe'' to incorporate. To overcome these challenges, we introduce a robust inference method based on weighted MaxSMT. By encoding uncertain biological observations as weighted soft constraints, our approach enables the solver to identify a model best reflecting the observations, even with some conflicting constraints. Our method allows for Boolean and multi-valued variable domains, alongside observations derived from discretisation (level constraints) and differential expression (ordering constraints). We show our approach can be used to successfully infer neural cell differentiation models from prior-knowledge networks with 200--1300 genes using ordering constraints on all included genes.
翻译:定性模型为复杂生物系统建模提供了关键工具。尽管自动推理与符号编码的进步已能够从数据中严格推断这些模型,但这一过程仍高度脆弱。首先,生物测量误差不可避免地会渗入形式化模型规范中。其次,当规范变得不可满足时,区分根本性设计缺陷与细微技术错误极为困难。这种不确定性常导致模型欠规范,因为无法确定哪些观测值仍可"安全"纳入。为应对这些挑战,我们提出一种基于加权MaxSMT的鲁棒推理方法。通过将不确定的生物观测值编码为加权软约束,我们的方法使求解器能够识别出最能反映观测值的模型,即便存在某些冲突约束。该方法支持布尔域与多值域变量,以及基于离散化(层级约束)和差异表达(排序约束)的观测值推导。我们展示了该方法可成功用于从包含200-1300个基因的先验知识网络中,利用所有包含基因的排序约束推断神经细胞分化模型。