Exception handling is a vital forward error-recovery mechanism in many programming languages, enabling developers to manage runtime anomalies through structured constructs (e.g., try-catch blocks). Improper or missing exception handling often leads to severe consequences, including system crashes and resource leaks. While large language models (LLMs) have demonstrated strong capabilities in code generation, they struggle with exception handling at the repository level, due to complex dependencies and contextual constraints. In this work, we propose CatchAll, a novel LLM-based approach for repository-aware exception handling. CatchAll equips LLMs with three complementary layers of exception-handling knowledge: (1) API-level exception knowledge, obtained from an empirically constructed API-exception mapping that characterizes the exception-throwing behaviors of APIs in real-world codebases; (2) repository-level execution context, which captures exception propagation by modeling contextual call traces around the target code; and (3) cross-repository handling knowledge, distilled from reusable exception-handling patterns mined from historical code across projects. The knowledge is encoded into structured prompts to guide the LLM in generating accurate and context-aware exception-handling code. To evaluate CatchAll, we construct two new benchmarks for repository-aware exception handling: a large-scale dataset RepoExEval and an executable subset RepoExEval-Exec. Experiments demonstrate that RepoExEval consistently outperforms state-of-the-art baselines, achieving a CodeBLEU score of 0.31 (vs. 0.27% for the best baseline), intent prediction accuracy of 60.1% (vs. 48.0%), and Pass@1 of 29% (vs. 25%). These results affirm RepoExEval's effectiveness in real-world repository-level exception handling.
翻译:异常处理是众多编程语言中至关重要的前向错误恢复机制,它使开发者能够通过结构化构造(例如try-catch块)管理运行时异常。不当或缺失的异常处理常导致严重后果,包括系统崩溃和资源泄漏。尽管大语言模型在代码生成方面展现出强大能力,但由于复杂的依赖关系和上下文约束,它们在仓库级别的异常处理上仍面临困难。本文提出CatchAll,一种新颖的基于大语言模型的仓库感知异常处理方法。CatchAll为大语言模型配备了三个互补层次的异常处理知识:(1)API级别的异常知识,通过经验构建的API-异常映射获取,该映射刻画了真实世界代码库中API的异常抛出行为;(2)仓库级别的执行上下文,通过建模目标代码周围的上下文调用轨迹来捕获异常传播;(3)跨仓库处理知识,从跨项目历史代码中挖掘的可复用异常处理模式提炼而来。这些知识被编码为结构化提示,以引导大语言模型生成准确且上下文感知的异常处理代码。为评估CatchAll,我们构建了两个新的仓库感知异常处理基准:大规模数据集RepoExEval及其可执行子集RepoExEval-Exec。实验表明,CatchAll在各项指标上持续优于现有最先进的基线方法,其CodeBLEU得分达到0.31(最佳基线为0.27),意图预测准确率为60.1%(基线为48.0%),Pass@1达到29%(基线为25%)。这些结果证实了CatchAll在真实世界仓库级别异常处理中的有效性。