SVRepair: Structured Visual Reasoning for Automated Program Repair

Large language models (LLMs) have recently shown strong potential for Automated Program Repair (APR), yet most existing approaches remain unimodal and fail to leverage the rich diagnostic signals contained in visual artifacts such as screenshots and control-flow graphs. In practice, many bug reports convey critical information visually (e.g., layout breakage or missing widgets), but directly using such dense visual inputs often causes context loss and noise, making it difficult for MLLMs to ground visual observations into precise fault localization and executable patches. To bridge this semantic gap, we propose \textbf{SVRepair}, a multimodal APR framework with structured visual representation. SVRepair first fine-tunes a vision-language model, \textbf{Structured Visual Representation (SVR)}, to uniformly transform heterogeneous visual artifacts into a \emph{semantic scene graph} that captures GUI elements and their structural relations (e.g., hierarchy), providing normalized, code-relevant context for downstream repair. Building on the graph, SVRepair drives a coding agent to localize faults and synthesize patches, and further introduces an iterative visual-artifact segmentation strategy that progressively narrows the input to bug-centered regions to suppress irrelevant context and reduce hallucinations. Extensive experiments across multiple benchmarks demonstrate state-of-the-art performance: SVRepair achieves \textbf{36.47\%} accuracy on SWE-Bench M, \textbf{38.02\%} on MMCode, and \textbf{95.12\%} on CodeVision, validating the effectiveness of SVRepair for multimodal program repair.

翻译：大型语言模型（LLM）近期在自动化程序修复（APR）领域展现出强大潜力，然而现有方法大多仍为单模态，未能充分利用截图与控制流图等视觉载体所蕴含的丰富诊断信息。实践中，许多缺陷报告通过视觉形式传递关键信息（例如界面布局错乱或控件缺失），但直接使用此类密集视觉输入常导致上下文丢失与噪声干扰，使得多模态大模型难以将视觉观察结果映射至精确的故障定位与可执行补丁。为弥合此语义鸿沟，本文提出\textbf{SVRepair}——一种具备结构化视觉表征的多模态APR框架。SVRepair首先微调视觉语言模型\textbf{结构化视觉表征器（SVR）}，将异构视觉载体统一转化为捕获GUI元素及其结构关系（如层级关系）的\textit{语义场景图}，为下游修复任务提供规范化、与代码相关的上下文。基于该图结构，SVRepair驱动编码智能体进行故障定位与补丁生成，并进一步提出迭代式视觉载体分割策略，通过逐步聚焦于缺陷相关区域来抑制无关上下文并减少幻觉生成。跨多个基准的广泛实验验证了其领先性能：SVRepair在SWE-Bench M上达到\textbf{36.47\%}的修复准确率，在MMCode上达到\textbf{38.02\%}，在CodeVision上达到\textbf{95.12\%}，充分证明了SVRepair在多模态程序修复中的有效性。