Probabilistic programming offers a powerful framework for modeling uncertainty, yet statistical model discovery in this domain entails navigating an immense search space under strict domain-specific constraints. When small language models are tasked with generating probabilistic programs, they frequently produce outputs that suffer from both syntactic and semantic errors, such as flawed inference constructs. Motivated by probabilistic programmers' domain expertise and debugging strategies, we introduce RefineStat, a language model--driven framework that enforces semantic constraints ensuring synthesized programs contain valid distributions and well-formed parameters, and then applies diagnostic-aware refinement by resampling prior or likelihood components whenever reliability checks fail. We evaluate RefineStat on multiple probabilistic-programming code-generation tasks using smaller language models (SLMs) and find that it produces programs that are both syntactically sound and statistically reliable, often matching or surpassing those from closed-source large language models (e.g., OpenAI o3).
翻译:概率编程为不确定性建模提供了强大的框架,但在此领域进行统计模型发现需在严格的领域特定约束下探索庞大的搜索空间。当小型语言模型被用于生成概率程序时,其输出常存在句法与语义错误,例如有缺陷的推理结构。受概率编程人员领域专业知识与调试策略启发,我们提出RefineStat——一种语言模型驱动的框架,该框架通过强制语义约束确保合成程序包含有效分布与规范参数,并在可靠性检查失败时通过重采样先验或似然组件实现诊断感知的精细化修正。我们采用小型语言模型在多项概率编程代码生成任务上评估RefineStat,结果表明其生成的程序兼具句法正确性与统计可靠性,性能常可媲美甚至超越闭源大型语言模型(如OpenAI o3)。