PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation Models

In recent times, a plethora of Large Code Generation Models (LCGMs) have been proposed, showcasing significant potential in assisting developers with complex programming tasks. Benchmarking LCGMs necessitates the creation of a set of diverse programming problems, and each problem comprises the prompt (including the task description), canonical solution, and test inputs. The existing methods for constructing such a problem set can be categorized into two main types: manual methods and perturbation-based methods. However, manual methods demand high effort and lack scalability, while also risking data integrity due to LCGMs' potentially contaminated data collection, and perturbation-based approaches mainly generate semantically homogeneous problems with the same canonical solutions and introduce typos that can be easily auto-corrected by IDE, making them ineffective and unrealistic. In this work, we propose the idea of programming problem merging (PPM) and provide two implementation of this idea, we utilize our tool on two widely-used datasets and compare it against nine baseline methods using eight code generation models. The results demonstrate the effectiveness of our tool in generating more challenging, diverse, and natural programming problems, comparing to the baselines.

翻译：近年来，大量大型代码生成模型被提出，展现出协助开发者完成复杂编程任务的巨大潜力。基准测试这些模型需要构建一组多样化的编程问题，每个问题包含提示（含任务描述）、规范解答和测试输入。现有构建此类问题集的方法主要分为两类：人工方法和基于扰动的方法。然而，人工方法成本高且缺乏可扩展性，同时因大型代码生成模型可能污染数据收集而存在数据完整性风险；基于扰动的方法主要生成语义同质的问题（具有相同的规范解答），并引入集成开发环境可自动修正的拼写错误，因而效果不佳且不切实际。本研究提出编程问题合并的思想，并给出两种实现方案。在两个广泛使用的数据集上应用该工具，并使用八个代码生成模型与九种基线方法进行对比。结果表明，与基线方法相比，我们的工具能够生成更具挑战性、更多样化且更自然的编程问题。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

14+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日