Explaining Competitive-Level Programming Solutions using LLMs

In this paper, we approach competitive-level programming problem-solving as a composite task of reasoning and code generation. We propose a novel method to automatically annotate natural language explanations to \textit{<problem, solution>} pairs. We show that despite poor performance in solving competitive-level programming problems, state-of-the-art LLMs exhibit a strong capacity in describing and explaining solutions. Our explanation generation methodology can generate a structured solution explanation for the problem containing descriptions and analysis. To evaluate the quality of the annotated explanations, we examine their effectiveness in two aspects: 1) satisfying the human programming expert who authored the oracle solution, and 2) aiding LLMs in solving problems more effectively. The experimental results on the CodeContests dataset demonstrate that while LLM GPT3.5's and GPT-4's abilities in describing the solution are comparable, GPT-4 shows a better understanding of the key idea behind the solution.

翻译：在本文中，我们将竞技级编程问题求解视为推理与代码生成的复合任务。我们提出了一种新颖方法，用于自动为\textit{<问题，解决方案>}对标注自然语言解释。我们证明，尽管最先进的大语言模型在解决竞技级编程问题时表现不佳，但其在描述和解释解决方案方面展现出强大能力。我们的解释生成方法能够生成包含描述与分析的结构化解决方案解释。为评估标注解释的质量，我们从两个方面检验其有效性：1) 满足编写原始解决方案的人类编程专家；2) 帮助大语言模型更有效地解决问题。在CodeContests数据集上的实验结果表明，尽管LLM GPT3.5与GPT-4在描述解决方案的能力上相当，但GPT-4对解决方案背后核心思想的理解更为透彻。

相关内容

GPT-4

关注 29

北京时间2023年3月15日凌晨，ChatGPT开发商OpenAI 发布了发布了全新的多模态预训练大模型 GPT-4，可以更可靠、更具创造力、能处理更细节的指令，根据图片和文字提示都能生成相应内容。具体来说来说，GPT-4 相比上一代的模型，实现了飞跃式提升：支持图像和文本输入，拥有强大的识图能力；大幅提升了文字输入限制，在ChatGPT模式下，GPT-4可以处理超过2.5万字的文本，可以处理一些更加细节的指令；回答准确性也得到了显著提高。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日