Bug reports are often unstructured and verbose, making it challenging for developers to efficiently comprehend software issues. Existing summarization approaches typically rely on surface-level textual cues, resulting in incomplete or redundant summaries, and they frequently ignore associated code snippets, which are essential for accurate defect diagnosis. To address these limitations, we propose a progressive code-integration framework for LLM-based abstractive bug report summarization. Our approach incrementally incorporates long code snippets alongside textual content, overcoming standard LLM context window constraints and producing semantically rich summaries. Evaluated on four benchmark datasets using eight LLMs, our pipeline outperforms extractive baselines by 7.5%-58.2% and achieves performance comparable to state-of-the-art abstractive methods, highlighting the benefits of jointly leveraging textual and code information for enhanced bug comprehension.
翻译:缺陷报告通常结构松散且内容冗长,这给开发者高效理解软件问题带来了挑战。现有的摘要生成方法通常依赖表层文本线索,导致摘要不完整或冗余,且常常忽略关联的代码片段,而这些代码对于准确缺陷诊断至关重要。为应对这些局限,我们提出了一种基于大语言模型的渐进式代码集成框架,用于抽象化缺陷报告摘要生成。该方法逐步整合长代码片段与文本内容,克服了标准大语言模型上下文窗口的限制,并生成语义丰富的摘要。通过在四个基准数据集上使用八种大语言模型进行评估,我们的流程在提取式基线方法上提升了7.5%至58.2%的性能,并达到了与最先进的抽象化方法相当的水平,突显了联合利用文本与代码信息以增强缺陷理解的优势。