Large language models show impressive results at predicting structured text such as code, but also commonly introduce errors and hallucinations in their output. When used to assist software developers, these models may make mistakes that users must go back and fix, or worse, introduce subtle bugs that users may miss entirely. We propose Randomized Utility-driven Synthesis of Uncertain REgions (R-U-SURE), an approach for building uncertainty-aware suggestions based on a decision-theoretic model of goal-conditioned utility, using random samples from a generative model as a proxy for the unobserved possible intents of the end user. Our technique combines minimum-Bayes-risk decoding, dual decomposition, and decision diagrams in order to efficiently produce structured uncertainty summaries, given only sample access to an arbitrary generative model of code and an optional AST parser. We demonstrate R-U-SURE on three developer-assistance tasks, and show that it can be applied different user interaction patterns without retraining the model and leads to more accurate uncertainty estimates than token-probability baselines. We also release our implementation as an open-source library at https://github.com/google-research/r_u_sure.
翻译:大型语言模型在预测结构化文本(如代码)方面表现出色,但其输出也常引入错误和幻觉。当用于辅助软件开发人员时,这些模型可能产生用户必须回头修复的错误,或更糟地,引入用户完全可能遗漏的细微缺陷。我们提出随机效用驱动的合成不确定区域(R-U-SURE),这是一种基于目标条件效用的决策理论模型构建不确定性感知建议的方法,利用生成模型的随机样本作为终端用户未观测到可能意图的代理。我们的技术结合了最小贝叶斯风险解码、对偶分解和决策图,在仅能对任意代码生成模型进行采样访问及可选AST解析器的条件下,高效生成结构化的不确定性摘要。我们在三项开发者辅助任务上展示了R-U-SURE,表明其无需重新训练模型即可适用于不同用户交互模式,并且相比基于令牌概率的基线方法能提供更准确的不确定性估计。我们还以开源库形式在https://github.com/google-research/r_u_sure发布了实现代码。