Large language model (LLM) agents integrate external tools with one or more LLMs to accomplish specific tasks. Agents have rapidly been adopted by developers, and they are starting to be deployed in industrial workflows, such as their use to fix static analysis issues from the widely used SonarQube static analyzer. However, the growing importance of agents means their actions carry greater impact and potential risk. Thus, to use them at scale, an additional layer of trust and evidence is necessary. This work presents AutoCodeSherpa, a technique that provides explanations of software issues in the form of symbolic formulae. Inspired by the reachability, infection, and propagation model of software faults, the explanations are composed of input, infection, and output conditions, collectively providing a specification of the issue. In practice, the symbolic explanation is implemented as a combination of a property-based test (PBT) and program-internal symbolic expressions. Critically, this means our symbolic explanations are executable and can be automatically evaluated, unlike natural language explanations. Experiments show the generated conditions are highly accurate. For example, input conditions from AutoCodeSherpa had an accuracy of 85.7%. This high accuracy makes symbolic explanations particularly useful in two scenarios. First, the explanations can be used in automated issue resolution environments to decide whether to accept or reject patches from issue resolution agents; AutoCodeSherpa could reject 2x as many incorrect patches as baselines did. Secondly, as agentic AI approaches continue to develop, program analysis driven explanations like ours can be provided to other LLM-based repair techniques which do not employ analysis to improve their output. In our experiments, our symbolic explanations could improve the plausible patch generation rate of the Agentless technique by 60%.
翻译:大型语言模型(LLM)代理通过整合外部工具与一个或多个LLM来完成特定任务。此类代理已被开发者迅速采用,并开始部署于工业工作流中,例如用于修复来自广泛使用的SonarQube静态分析器的静态分析问题。然而,代理日益增长的重要性意味着其行为具有更大的影响和潜在风险。因此,要大规模使用它们,需要额外的信任与证据层。本研究提出AutoCodeSherpa,一种以符号公式形式提供软件问题解释的技术。受软件故障的可达性、感染与传播模型启发,该解释由输入条件、感染条件和输出条件构成,共同形成对问题的规约描述。在实践中,符号化解释通过结合基于属性的测试(PBT)与程序内部符号表达式来实现。关键在于,这意味着我们的符号化解释是可执行的,并且能够自动评估,这与自然语言解释截然不同。实验表明,生成的条件具有很高的准确性。例如,AutoCodeSherpa生成的输入条件准确率达到85.7%。这种高准确性使得符号化解释在两种场景中尤为有用。首先,解释可用于自动化问题修复环境,以决定是否接受来自问题修复代理的补丁;AutoCodeSherpa可拒绝的错误补丁数量是基线方法的2倍。其次,随着智能体AI方法的持续发展,类似我们这种由程序分析驱动的解释,可以提供给其他不采用分析技术的基于LLM的修复方法,以改进其输出质量。在我们的实验中,我们的符号化解释能将无代理技术的合理补丁生成率提升60%。