Comparison of new computational methods for geostatistical modelling of malaria

Geostatistical analysis of health data is increasingly used to model spatial variation in malaria prevalence, burden, and other metrics. Traditional inference methods for geostatistical modelling are notoriously computationally intensive, motivating the development of newer, approximate methods. The appeal of faster methods is particularly great as the size of the region and number of spatial locations being modelled increases. Methods We present an applied comparison of four proposed `fast' geostatistical modelling methods and the software provided to implement them -- Integrated Nested Laplace Approximation (INLA), tree boosting with Gaussian processes and mixed effect models (GPBoost), Fixed Rank Kriging (FRK) and Spatial Random Forests (SpRF). We illustrate the four methods by estimating malaria prevalence on two different spatial scales -- country and continent. We compare the performance of the four methods on these data in terms of accuracy, computation time, and ease of implementation. Results Two of these methods -- SpRF and GPBoost -- do not scale well as the data size increases, and so are likely to be infeasible for larger-scale analysis problems. The two remaining methods -- INLA and FRK -- do scale well computationally, however the resulting model fits are very sensitive to the user's modelling assumptions and parameter choices. Conclusions INLA and FRK both enable scalable geostatistical modelling of malaria prevalence data. However care must be taken when using both methods to assess the fit of the model to data and plausibility of predictions, in order to select appropriate model assumptions and approximation parameters.

翻译：健康数据地统计学分析日益被用于模拟疟疾患病率、疾病负担及其他指标的时空变异。传统地统计学建模的推断方法计算强度极大，这促使了新型近似方法的发展。随着模拟区域规模和空间位置数量的增加，对更快方法的需求尤为迫切。方法本文对四种"快速"地统计学建模方法及其实现软件进行了应用比较——集成嵌套拉普拉斯近似（INLA）、树提升与高斯过程及混合效应模型（GPBoost）、固定秩克里金法（FRK）以及空间随机森林（SpRF）。我们通过在国家与大陆两种空间尺度上估算疟疾患病率来展示这四种方法的应用，并从精度、计算时间和实施便捷性三个维度比较了它们在这些数据上的性能。结果其中两种方法——SpRF与GPBoost——随着数据量增大可扩展性较差，因此难以适用于大规模分析问题。其余两种方法——INLA与FRK——在计算上具有良好的可扩展性，但其模型拟合结果对用户的建模假设和参数选择极为敏感。结论 INLA与FRK均能实现疟疾患病率数据的可扩展地统计学建模，但使用这两种方法时必须谨慎评估模型对数据的拟合程度及预测的合理性，以选择合适的模型假设与近似参数。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

因果推断，Causal Inference：The Mixtape

专知会员服务

110+阅读 · 2021年8月27日