When mathematical biology models are used to make quantitative predictions for clinical or industrial use, it is important that these predictions come with a reliable estimate of their accuracy (uncertainty quantification). Because models of complex biological systems are always large simplifications, model discrepancy arises - where a mathematical model fails to recapitulate the true data generating process. This presents a particular challenge for making accurate predictions, and especially for making accurate estimates of uncertainty in these predictions. Experimentalists and modellers must choose which experimental procedures (protocols) are used to produce data to train their models. We propose to characterise uncertainty owing to model discrepancy with an ensemble of parameter sets, each of which results from training to data from a different protocol. The variability in predictions from this ensemble provides an empirical estimate of predictive uncertainty owing to model discrepancy, even for unseen protocols. We use the example of electrophysiology experiments, which are used to investigate the kinetics of the hERG potassium ion channel. Here, `information-rich' protocols allow mathematical models to be trained using numerous short experiments performed on the same cell. Typically, assuming independent observational errors and training a model to an individual experiment results in parameter estimates with very little dependence on observational noise. Moreover, parameter sets arising from the same model applied to different experiments often conflict - indicative of model discrepancy. Our methods will help select more suitable mathematical models of hERG for future studies, and will be widely applicable to a range of biological modelling problems.
翻译:当数学建模用于临床或工业定量预测时,必须为这些预测提供可靠的精度估计(不确定性量化)。由于复杂生物系统模型通常存在大量简化,模型差异随之产生——即数学模型无法准确再现真实数据生成过程。这给准确预测带来了特殊挑战,尤其是对预测结果进行准确的不确定性估计。实验人员和建模者必须选择用于生成训练数据的实验方案(protocols)。我们提出通过参数集合的集成方法表征模型差异导致的不确定性,该集合中的每个参数集均源自不同实验方案的数据训练。该集成产生的预测变异性为模型差异导致的预测不确定性提供了经验估计,即使对于未见过的实验方案同样适用。以研究hERG钾离子通道动力学的电生理实验为例:通过"信息丰富"的实验方案,可在同一细胞上开展大量短时实验来训练数学模型。通常,在假设观测误差独立并将模型拟合至单个实验的条件下,参数估计结果几乎不受观测噪声影响。更关键的是,相同模型在不同实验中的参数集往往相互矛盾——这正是模型差异的典型特征。我们的方法将有助于为未来研究选择更合适的hERG数学模型,并可广泛应用于各类生物建模问题。