Mathematical models of natural and man-made systems often have many adjustable parameters that must be estimated from multiple, potentially conflicting datasets. Rather than reporting a single best-fit parameter vector, it is often more informative to generate an ensemble of parameter sets that collectively map out the trade-offs among competing objectives. This paper presents ParetoEnsembles.jl, an open-source Julia package that generates such ensembles using Pareto Optimal Ensemble Techniques (POETs), a simulated-annealing-based algorithm that requires no gradient information. The implementation corrects the original dominance relation from weak to strict Pareto dominance, reduces the per-iteration ranking cost from $O(n^2 m)$ to $O(nm)$ through an incremental update scheme, and adds multi-chain parallel execution for improved front coverage. We demonstrate the package on a cell-free gene expression model fitted to experimental data and a blood coagulation cascade model with ten estimated rate constants and three objectives. A controlled synthetic-data study reveals parameter identifiability structure, with individual rate constants off by several-fold yet model predictions accurate to 7%. A five-replicate coverage analysis confirms that timing features are reliably covered while peak amplitude is systematically overconfident. Validation against published experimental thrombin generation data demonstrates that the ensemble predicts held-out conditions to within 10% despite inherent model approximation error. By making ensemble generation lightweight and accessible, ParetoEnsembles.jl aims to lower the barrier to routine uncertainty characterization in mechanistic modeling.
翻译:自然与人工系统的数学模型通常包含多个可调参数,这些参数需要从多个可能相互冲突的数据集中进行估计。相较于报告单一的最佳拟合参数向量,生成一组能够共同描绘不同目标之间权衡关系的参数集往往更具信息价值。本文介绍了ParetoEnsembles.jl——一个开源Julia包,它采用基于模拟退火的帕累托最优集成技术(POETs)生成此类参数集,该算法无需梯度信息。本实现将原始支配关系从弱帕累托支配修正为严格帕累托支配,通过增量更新方案将每次迭代的排序成本从$O(n^2 m)$降低至$O(nm)$,并引入多链并行执行以改善前沿覆盖。我们利用拟合实验数据的无细胞基因表达模型和包含十个估计速率常数及三个目标的凝血级联模型对该包进行了验证。一项受控合成数据研究揭示了参数可辨识性结构:各速率常数的偏差可达数倍,但模型预测精度保持在7%以内。五次重复的覆盖分析证实时序特征被可靠覆盖,而峰值振幅存在系统性过度置信。针对已发表实验性凝血酶生成数据的验证表明,尽管模型存在固有近似误差,该集成方法仍能将留出条件的预测偏差控制在10%以内。通过实现轻量化且易用的集成生成过程,ParetoEnsembles.jl旨在降低机理建模中常规不确定性表征的应用门槛。