Geostatistical analysis of health data is increasingly used to model spatial variation in malaria prevalence, burden, and other metrics. Traditional inference methods for geostatistical modelling are notoriously computationally intensive, motivating the development of newer, approximate methods. The appeal of faster methods is particularly great as the size of the region and number of spatial locations being modelled increases. Methods We present an applied comparison of four proposed `fast' geostatistical modelling methods and the software provided to implement them -- Integrated Nested Laplace Approximation (INLA), tree boosting with Gaussian processes and mixed effect models (GPBoost), Fixed Rank Kriging (FRK) and Spatial Random Forests (SpRF). We illustrate the four methods by estimating malaria prevalence on two different spatial scales -- country and continent. We compare the performance of the four methods on these data in terms of accuracy, computation time, and ease of implementation. Results Two of these methods -- SpRF and GPBoost -- do not scale well as the data size increases, and so are likely to be infeasible for larger-scale analysis problems. The two remaining methods -- INLA and FRK -- do scale well computationally, however the resulting model fits are very sensitive to the user's modelling assumptions and parameter choices. Conclusions INLA and FRK both enable scalable geostatistical modelling of malaria prevalence data. However care must be taken when using both methods to assess the fit of the model to data and plausibility of predictions, in order to select appropriate model assumptions and approximation parameters.
翻译:健康数据地统计学分析日益被用于模拟疟疾患病率、疾病负担及其他指标的时空变异。传统地统计学建模的推断方法计算强度极大,这促使了新型近似方法的发展。随着模拟区域规模和空间位置数量的增加,对更快方法的需求尤为迫切。方法 本文对四种"快速"地统计学建模方法及其实现软件进行了应用比较——集成嵌套拉普拉斯近似(INLA)、树提升与高斯过程及混合效应模型(GPBoost)、固定秩克里金法(FRK)以及空间随机森林(SpRF)。我们通过在国家与大陆两种空间尺度上估算疟疾患病率来展示这四种方法的应用,并从精度、计算时间和实施便捷性三个维度比较了它们在这些数据上的性能。结果 其中两种方法——SpRF与GPBoost——随着数据量增大可扩展性较差,因此难以适用于大规模分析问题。其余两种方法——INLA与FRK——在计算上具有良好的可扩展性,但其模型拟合结果对用户的建模假设和参数选择极为敏感。结论 INLA与FRK均能实现疟疾患病率数据的可扩展地统计学建模,但使用这两种方法时必须谨慎评估模型对数据的拟合程度及预测的合理性,以选择合适的模型假设与近似参数。