When mathematical biology models are used to make quantitative predictions for clinical or industrial use, it is important that these predictions come with a reliable estimate of their accuracy (uncertainty quantification). Because models of complex biological systems are always large simplifications, model discrepancy arises - where a mathematical model fails to recapitulate the true data generating process. This presents a particular challenge for making accurate predictions, and especially for making accurate estimates of uncertainty in these predictions. Experimentalists and modellers must choose which experimental procedures (protocols) are used to produce data to train their models. We propose to characterise uncertainty owing to model discrepancy with an ensemble of parameter sets, each of which results from training to data from a different protocol. The variability in predictions from this ensemble provides an empirical estimate of predictive uncertainty owing to model discrepancy, even for unseen protocols. We use the example of electrophysiology experiments, which are used to investigate the kinetics of the hERG potassium ion channel. Here, 'information-rich' protocols allow mathematical models to be trained using numerous short experiments performed on the same cell. Typically, assuming independent observational errors and training a model to an individual experiment results in parameter estimates with very little dependence on observational noise. Moreover, parameter sets arising from the same model applied to different experiments often conflict - indicative of model discrepancy. Our methods will help select more suitable mathematical models of hERG for future studies, and will be widely applicable to a range of biological modelling problems.
翻译:当数学生物学模型用于临床或工业的定量预测时,这些预测必须附带可靠的精度估计(不确定性量化)。由于复杂生物系统模型始终存在大幅简化,模型偏差随之产生——即数学模型无法完整再现真实数据生成过程。这给准确预测带来了特殊挑战,尤其是对这些预测的不确定性进行精确估算。实验人员与建模者必须选择用于生成训练数据的实验程序(方案)。我们提出通过参数集集成来表征由模型偏差导致的不确定性,其中每个参数集来源于不同方案数据的训练结果。该集成预测结果的变异性可提供因模型偏差导致的经验预测不确定性估计,即使对于未见方案同样适用。我们以电生理实验为例,这类实验用于研究hERG钾离子通道动力学。在此类研究中,"信息丰富"方案允许利用同一细胞上进行的多次短时实验训练数学模型。通常,假设观测误差独立且将模型训练至单次实验时,参数估计对观测噪声的依赖性极小。更关键的是,同一模型应用于不同实验所产生的参数集常存在冲突——这正是模型偏差的显著特征。我们的方法将有助于为未来研究选择更合适的hERG数学模型,并将广泛适用于各类生物建模问题。