Probabilistic programming offers a powerful framework for modeling uncertainty, yet statistical model discovery in this domain entails navigating an immense search space under strict domain-specific constraints. When small language models are tasked with generating probabilistic programs, they frequently produce outputs that suffer from both syntactic and semantic errors, such as flawed inference constructs. Motivated by probabilistic programmers' domain expertise and debugging strategies, we introduce RefineStat, a language model--driven framework that enforces semantic constraints ensuring synthesized programs contain valid distributions and well-formed parameters, and then applies diagnostic-aware refinement by resampling prior or likelihood components whenever reliability checks fail. We evaluate RefineStat on multiple probabilistic-programming code-generation tasks using smaller language models (SLMs) and find that it produces programs that are both syntactically sound and statistically reliable, often matching or surpassing those from closed-source large language models (e.g., OpenAI o3).
翻译:概率编程为不确定性建模提供了强大框架,然而该领域的统计模型发现需要在严格领域特定约束下探索巨大的搜索空间。当小型语言模型被用于生成概率程序时,其输出常存在语法与语义双重错误,例如存在缺陷的推断结构。受概率编程者领域专业知识与调试策略启发,我们提出RefineStat——一个由语言模型驱动的框架:该框架通过强制语义约束确保合成程序包含有效分布与规范参数,并在可靠性检查失败时通过重采样先验或似然组件实施诊断感知的精细化修正。我们在多个概率编程代码生成任务中使用小型语言模型对RefineStat进行评估,发现其生成的程序不仅语法正确且统计可靠,其性能常可匹配甚至超越闭源大型语言模型(如OpenAI o3)的生成结果。