Computer power is a constantly increasing demand in scientific data analyses, in particular when Markov Chain Monte Carlo (MCMC) methods are involved, for example for estimating integral functions or Bayesian posterior probabilities. In this paper, we describe the benefits of a parallel computation of MCMC using a cloud-based, serverless architecture: first, the computation time can be spread over thousands of processes, hence greatly reducing the time the user should wait to have its computation completed. Second, the overhead time required for running in parallel several processes is minor and grows logarithmically with respect to the number of processes. Third, the serverless approach does not require time-consuming efforts for maintaining and updating the computing infrastructure when/if the number of walkers increases or for adapting the code to optimally use the infrastructure. The benefits are illustrated with the computation of the posterior probability distribution of a real astronomical analysis.
翻译:科学数据分析对计算能力的需求持续增长,尤其是涉及马尔可夫链蒙特卡洛方法(如积分函数估计或贝叶斯后验概率计算)时。本文阐述了采用基于云的无服务器架构并行计算MCMC的优势:首先,计算时间可分散至数千个进程,从而大幅缩短用户等待计算结果所需的时间;其次,并行运行多个进程的额外开销较小,且随进程数量呈对数增长;第三,当walker数量增加或需要调整代码以优化基础设施利用时,无服务器方法无需投入大量时间维护和更新计算基础设施。我们通过真实天文分析的后验概率分布计算实例,验证了上述优势。