Probe-and-Refine Tuning of Repository Guidance for Coding Agents - 专知论文

会员服务 ·

0

Guidance · Agent · tuning · 代码 · 查准率/准确率 ·

Probe-and-Refine Tuning of Repository Guidance for Coding Agents

翻译：暂无翻译

Asa Shepard,Jeannie Albrecht

LLM-based coding agents need higher-level operational knowledge about a repository (which files house which subsystems, how to run the test suite, which workflows have historically led to wrong fixes) that does not exist in the code itself. Engineers typically maintain AGENTS.md files to supply this context as instructions for coding agents, but whether they help is contested: recent studies disagree on whether LLM-generated guidance improves or harms agent performance. In this paper we show that how the guidance is produced is the decisive variable, and introduce probe-and-refine tuning: a procedure that uses synthetic bug-fix probes to iteratively diagnose and patch a repository's guidance file through single-shot LLM calls, with no agent loop or tool use during tuning. On SWE-bench Verified across four independent trials with Qwen3.5-35B-A3B at 200 steps, probe-and-refine achieves 33.0% mean resolve rate vs. 28.3% for the static knowledge base used to initialize it and 25.5% for an unguided baseline (p < 0.001 for both probe-and-refine contrasts). The improvement comes from coverage rather than precision: refined guidance produces evaluable patches for 14.5 percentage points (pp) more instances while per-patch precision remains statistically constant (~59%, p = 0.119), showing that improved guidance helps agents reach the correct file rather than improving the quality of the changes they make. Further, a step-budget experiment shows that guidance is what lets the agent use a larger step budget productively, and a cross-model experiment with NVIDIA-Nemotron-3-Nano-30B-A3B finds that the tuning loop degrades when the model cannot generate sufficiently diagnostic output, though per-patch precision remains constant even then.

翻译：暂无翻译

0

相关内容

Guidance

AgentOps综述：智能体系统运维框架

AgentOps综述：智能体系统运维框架

专知会员服务

19+阅读 · 6月4日

《程序性知识提高代理型大语言模型工作流程》美海军研究实验室43页

《程序性知识提高代理型大语言模型工作流程》美海军研究实验室43页

专知会员服务

25+阅读 · 2025年11月26日

最新新Agent综述！76页327篇论文梳理，北交大桑基韬教授团队发布《迈向模型原生智能体式人工智能的范式转变综述》

最新新Agent综述！76页327篇论文梳理，北交大桑基韬教授团队发布《迈向模型原生智能体式人工智能的范式转变综述》

专知会员服务

42+阅读 · 2025年10月17日

AgentOps综述：分类、挑战与未来方向

AgentOps综述：分类、挑战与未来方向

专知会员服务

40+阅读 · 2025年8月6日

NAACL 2025 | 知识增强下的智能体规划

NAACL 2025 | 知识增强下的智能体规划

专知会员服务

37+阅读 · 2025年3月25日

定制化大型语言模型的图检索增强生成综述

定制化大型语言模型的图检索增强生成综述

专知会员服务

38+阅读 · 2025年1月28日

KG-Agent：面向KG复杂推理的高效自治代理框架

KG-Agent：面向KG复杂推理的高效自治代理框架

专知会员服务

35+阅读 · 2024年6月1日

AI Agent，大模型时代重要落地方向, 42页ppt

AI Agent，大模型时代重要落地方向, 42页ppt

专知会员服务

291+阅读 · 2023年10月12日

【NeurIPS 2023】大型语言模型的规划能力——一项关键性研究

【NeurIPS 2023】大型语言模型的规划能力——一项关键性研究

专知会员服务

46+阅读 · 2023年9月22日

探究检索增强下的大模型知识边界

探究检索增强下的大模型知识边界

专知会员服务

56+阅读 · 2023年7月25日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

论文浅尝 | 用可微的逻辑规则学习完成知识库推理

论文浅尝 | 用可微的逻辑规则学习完成知识库推理

开放知识图谱

14+阅读 · 2018年7月5日

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

专知

12+阅读 · 2018年3月24日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

论文浅尝 | Improved Neural Relation Detection for KBQA

论文浅尝 | Improved Neural Relation Detection for KBQA

开放知识图谱

13+阅读 · 2018年1月21日

论文浅尝 | Question Answering over Freebase

论文浅尝 | Question Answering over Freebase

开放知识图谱

19+阅读 · 2018年1月9日

9 篇顶会论文解读推荐中的序列化建模：Session-based Neural Recommendation

9 篇顶会论文解读推荐中的序列化建模：Session-based Neural Recommendation

PaperWeekly

11+阅读 · 2017年11月9日

基于参数和结构优化的置信规则库推理方法研究

国家自然科学基金

5+阅读 · 2015年12月31日

金属配位交联高性能聚合物的构筑及其络合/解离机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

复合材料板壳结构低速冲击损伤问题的扩展逐层理论研究及其应用

国家自然科学基金

0+阅读 · 2015年12月31日

多信源协作网络编码与QC-LDPC码的联合设计和迭代译码研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于语义推理的船舶舱室布置进化设计方法

国家自然科学基金

0+阅读 · 2015年12月31日

集群环境下内存空间数据库管理与查询技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向任务成功性的可修系统重要度分析及优化

国家自然科学基金

0+阅读 · 2014年12月31日

复合材料连接结构损伤探测技术与服役状态表征理论

国家自然科学基金

0+阅读 · 2014年12月31日

可重构的环境自适应RS码软判决译码器研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向人与Agent混合的多团队协作仿真训练方法研究

国家自然科学基金

19+阅读 · 2012年12月31日

Bayesian control for coding agents

Arxiv

0+阅读 · 6月23日

Detecting Malicious Agent Skills in the Wild using Attention

Arxiv

0+阅读 · 6月22日

Managing Procedural Memory in LLM Agents: Control, Adaptation, and Evaluation

Arxiv

0+阅读 · 6月22日

AgentLens: Interpretable Safety Steering via Mechanistic Subspaces for Multi-Turn Coding Agent

Arxiv

0+阅读 · 6月21日

Confident and Wrong: Silent Semantic Failures in Coding Agents

Arxiv

0+阅读 · 6月21日

Lingering Authority: Revocable Resource-and-Effect Capabilities for Coding Agents

Arxiv

0+阅读 · 6月21日

Configuration Smells in AGENTS.md Files: Common Mistakes in Configuring Coding Agents

Arxiv

0+阅读 · 6月19日

AgentMeter: Evaluating Model-CLI Matching for CLI-Based Local Task-Solving Agents

Arxiv

0+阅读 · 6月19日

N-Version Programming with Coding Agents

Arxiv

0+阅读 · 6月18日

FastContext: Training Efficient Repository Explorer for Coding Agents

Arxiv

0+阅读 · 6月18日

VIP会员

文章信息

相关主题

查准率/准确率

最新内容

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

专知会员服务

3+阅读 · 今天6:30

网状网络及其在军事领域的运用

网状网络及其在军事领域的运用

专知会员服务

4+阅读 · 今天6:18

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

专知会员服务

4+阅读 · 今天6:08

无美国参与的欧洲战争方式（万字长文）

无美国参与的欧洲战争方式（万字长文）

专知会员服务

4+阅读 · 今天5:54

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

专知会员服务

4+阅读 · 今天5:22

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

专知会员服务

5+阅读 · 今天5:15

《国防领域敏感性分析白皮书》

《国防领域敏感性分析白皮书》

专知会员服务

5+阅读 · 今天3:42

综述 | 从问答到任务完成：Agent系统与Harness设计

综述 | 从问答到任务完成：Agent系统与Harness设计

专知会员服务

4+阅读 · 6月24日

Agentic RL：框架、实践与长程智能体训练

Agentic RL：框架、实践与长程智能体训练

专知会员服务

3+阅读 · 6月24日

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

专知会员服务

9+阅读 · 6月24日

重新思考无人机时代的生存能力

重新思考无人机时代的生存能力

专知会员服务

8+阅读 · 6月24日

装甲突击旅：现代战争思考、战斗与组织

装甲突击旅：现代战争思考、战斗与组织

专知会员服务

6+阅读 · 6月24日

在人工智能加速决策环境中拓展OODA循环

在人工智能加速决策环境中拓展OODA循环

专知会员服务

8+阅读 · 6月24日

《廉价自杀式无人机战争的军事战略影响：乌克兰与伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰与伊朗案例研究》

专知会员服务

7+阅读 · 6月24日

军事欺骗：供作战战术指挥官使用的工具

军事欺骗：供作战战术指挥官使用的工具

专知会员服务

6+阅读 · 6月24日

相关VIP内容

AgentOps综述：智能体系统运维框架

AgentOps综述：智能体系统运维框架

专知会员服务

19+阅读 · 6月4日

《程序性知识提高代理型大语言模型工作流程》美海军研究实验室43页

《程序性知识提高代理型大语言模型工作流程》美海军研究实验室43页

专知会员服务

25+阅读 · 2025年11月26日

最新新Agent综述！76页327篇论文梳理，北交大桑基韬教授团队发布《迈向模型原生智能体式人工智能的范式转变综述》

最新新Agent综述！76页327篇论文梳理，北交大桑基韬教授团队发布《迈向模型原生智能体式人工智能的范式转变综述》

专知会员服务

42+阅读 · 2025年10月17日

AgentOps综述：分类、挑战与未来方向

AgentOps综述：分类、挑战与未来方向

专知会员服务

40+阅读 · 2025年8月6日

NAACL 2025 | 知识增强下的智能体规划

NAACL 2025 | 知识增强下的智能体规划

专知会员服务

37+阅读 · 2025年3月25日

定制化大型语言模型的图检索增强生成综述

定制化大型语言模型的图检索增强生成综述

专知会员服务

38+阅读 · 2025年1月28日

KG-Agent：面向KG复杂推理的高效自治代理框架

KG-Agent：面向KG复杂推理的高效自治代理框架

专知会员服务

35+阅读 · 2024年6月1日

AI Agent，大模型时代重要落地方向, 42页ppt

AI Agent，大模型时代重要落地方向, 42页ppt

专知会员服务

291+阅读 · 2023年10月12日

【NeurIPS 2023】大型语言模型的规划能力——一项关键性研究

【NeurIPS 2023】大型语言模型的规划能力——一项关键性研究

专知会员服务

46+阅读 · 2023年9月22日

探究检索增强下的大模型知识边界

探究检索增强下的大模型知识边界

专知会员服务

56+阅读 · 2023年7月25日

热门VIP内容

开通专知VIP会员享更多权益服务

网状网络及其在军事领域的运用

无美国参与的欧洲战争方式（万字长文）

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

相关资讯

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

论文浅尝 | 用可微的逻辑规则学习完成知识库推理

论文浅尝 | 用可微的逻辑规则学习完成知识库推理

开放知识图谱

14+阅读 · 2018年7月5日

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

专知

12+阅读 · 2018年3月24日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

论文浅尝 | Improved Neural Relation Detection for KBQA

论文浅尝 | Improved Neural Relation Detection for KBQA

开放知识图谱

13+阅读 · 2018年1月21日

论文浅尝 | Question Answering over Freebase

论文浅尝 | Question Answering over Freebase

开放知识图谱

19+阅读 · 2018年1月9日

9 篇顶会论文解读推荐中的序列化建模：Session-based Neural Recommendation

9 篇顶会论文解读推荐中的序列化建模：Session-based Neural Recommendation

PaperWeekly

11+阅读 · 2017年11月9日

相关论文

Bayesian control for coding agents

Arxiv

0+阅读 · 6月23日

Detecting Malicious Agent Skills in the Wild using Attention

Arxiv

0+阅读 · 6月22日

Managing Procedural Memory in LLM Agents: Control, Adaptation, and Evaluation

Arxiv

0+阅读 · 6月22日

AgentLens: Interpretable Safety Steering via Mechanistic Subspaces for Multi-Turn Coding Agent

Arxiv

0+阅读 · 6月21日

Confident and Wrong: Silent Semantic Failures in Coding Agents

Arxiv

0+阅读 · 6月21日

Lingering Authority: Revocable Resource-and-Effect Capabilities for Coding Agents

Arxiv

0+阅读 · 6月21日

Configuration Smells in AGENTS.md Files: Common Mistakes in Configuring Coding Agents

Arxiv

0+阅读 · 6月19日

AgentMeter: Evaluating Model-CLI Matching for CLI-Based Local Task-Solving Agents

Arxiv

0+阅读 · 6月19日

N-Version Programming with Coding Agents

Arxiv

0+阅读 · 6月18日

FastContext: Training Efficient Repository Explorer for Coding Agents

Arxiv

0+阅读 · 6月18日

相关基金

基于参数和结构优化的置信规则库推理方法研究

国家自然科学基金

5+阅读 · 2015年12月31日

金属配位交联高性能聚合物的构筑及其络合/解离机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

复合材料板壳结构低速冲击损伤问题的扩展逐层理论研究及其应用

国家自然科学基金

0+阅读 · 2015年12月31日

多信源协作网络编码与QC-LDPC码的联合设计和迭代译码研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于语义推理的船舶舱室布置进化设计方法

国家自然科学基金

0+阅读 · 2015年12月31日

集群环境下内存空间数据库管理与查询技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向任务成功性的可修系统重要度分析及优化

国家自然科学基金

0+阅读 · 2014年12月31日

复合材料连接结构损伤探测技术与服役状态表征理论

国家自然科学基金

0+阅读 · 2014年12月31日

可重构的环境自适应RS码软判决译码器研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向人与Agent混合的多团队协作仿真训练方法研究

国家自然科学基金

19+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员