AutoCode：将大型语言模型作为竞赛编程的题目生成器 (AutoCode: LLMs as Problem Setters for Competitive Programming)

Shang Zhou,Zihan Zheng,Kaiyuan Liu,Zeyu Shen,Zerui Cheng,Zexing Chen,Hansen He,Jianzhu Yao,Huanzhi Mao,Qiuyang Mang,Tianfu Fu,Beichen Li,Dongruixuan Li,Wenhao Chai,Zhuang Liu,Aleksandra Korolova,Peter Henderson,Natasha Jaques,Pramod Viswanath,Saining Xie,Jingbo Shang

from arxiv, Project page: https://livecodebenchpro.com/projects/autocode/overview

Writing competitive programming problems is exacting. Authors must: set constraints, input distributions, and edge cases that rule out shortcuts; target specific algorithms (e.g., max-flow, dynamic programming, data structures); and calibrate complexity beyond the reach of most competitors. We argue that this makes for an ideal test of general large language model capabilities and study whether they can do this reliably. We introduce AutoCode, which uses multiple rounds of validation to yield competition-grade problem statements and test cases. On held-out problems, AutoCode test suites approach 99% consistency with official judgments, a significant improvement over current state-of-the-art methods like HardTests, which achieve less than 81%. Furthermore, starting with a random seed problem, AutoCode can create novel variants with reference and brute-force solutions. By cross-verifying these generated solutions against test cases, we can further filter out malformed problems. Our system ensures high correctness, as verified by human experts. AutoCode successfully produces novel problems judged by Grandmaster-level (top 0.3%) competitive programmers to be of contest quality.

翻译：编写竞赛编程题目是一项严谨的工作。作者必须：设定约束条件、输入分布和边界情况以排除取巧解法；针对特定算法（如最大流、动态规划、数据结构）设计题目；并校准题目难度使其超出大多数参赛者的解决能力。我们认为这为测试大型语言模型的通用能力提供了理想场景，并研究其能否可靠完成此项任务。本文提出AutoCode系统，该系统通过多轮验证生成竞赛级别的问题描述与测试用例。在预留测试题集上，AutoCode生成的测试套件与官方评判结果的一致性接近99%，相比当前最优方法（如HardTests的81%以下）有显著提升。此外，从随机种子问题出发，AutoCode能够创建包含参考解与暴力解的新颖变体题目。通过将生成解与测试用例进行交叉验证，可进一步筛选出结构不良的问题。经专家验证，本系统能确保较高的正确性。AutoCode成功生成的新颖题目被最高级别（前0.3%）的竞赛编程专家评定为具备正式比赛质量。

相关内容

CASES

关注 4

CASES：International Conference on Compilers, Architectures, and Synthesis for Embedded Systems。 Explanation：嵌入式系统编译器、体系结构和综合国际会议。 Publisher：ACM。 SIT： http://dblp.uni-trier.de/db/conf/cases/index.html

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日