Coding and computation remain major bottlenecks in Markov chain Monte Carlo (MCMC) workflows, especially as modern sampling algorithms have become increasingly complex and existing probabilistic programming systems remain limited in model support, extensibility, and composability. We introduce \textbf{AI4BayesCode}, an extensible LLM-driven system that translates natural-language Bayesian model descriptions into runnable, validated MCMC samplers. To improve reliability, AI4BayesCode adopts a modular design that decomposes models into modular sampling blocks and maps each block to a built-in sampling component, reducing the need to implement complex sampling algorithms from scratch. Reliability is further improved through pre-generation validation of model specifications and post-generation validation of generated sampler code. AI4BayesCode also introduces a novel recursively stateful coding paradigm for MCMC, allowing modular sampling components, potentially developed by different contributors, to be composed coherently within larger MCMC procedures. We develop a benchmark suite to evaluate AI4BayesCode for sampler-generation. Experiments show that AI4BayesCode can implement a wide range of Bayesian models from natural-language descriptions alone. As an open-ended system, its capability can continue to expand with improvements in the underlying AI agent and the addition of new built-in blocks.
翻译:编码与计算仍是马尔可夫链蒙特卡罗(MCMC)工作流中的主要瓶颈,尤其是随着现代采样算法日益复杂,而现有概率编程系统在模型支持、可扩展性和可组合性方面仍存在局限。我们提出**AI4BayesCode**,这是一个可扩展的LLM驱动系统,可将自然语言描述的贝叶斯模型转化为可运行且经过验证的MCMC采样器。为提升可靠性,AI4BayesCode采用模块化设计,将模型分解为模块化采样块,并将每个块映射至内置采样组件,从而减少从零实现复杂采样算法的需求。可靠性通过生成前对模型规范的验证以及生成后对采样器代码的验证进一步强化。此外,AI4BayesCode还引入了新颖的递归有状态编码范式,使由不同贡献者开发的模块化采样组件能够在大规模MCMC流程中连贯组合。我们构建了一套基准测试集以评估AI4BayesCode的采样器生成能力。实验表明,仅凭自然语言描述,AI4BayesCode即可实现广泛的贝叶斯模型。作为一个开放系统,其能力可随底层AI代理的改进及新内置模块的添加持续扩展。