Codehacks: A Dataset of Adversarial Tests for Competitive Programming Problems Obtained from Codeforces

Software is used in critical applications in our day-to-day life and it is important to ensure its correctness. One popular approach to assess correctness is to evaluate software on tests. If a test fails, it indicates a fault in the software under test; if all tests pass correctly, one may assume that the software is correct. However, the reliability of these results depends on the test suite considered, and there is a risk of false negatives (i.e. software that passes all available tests but contains bugs because some cases are not tested). Therefore, it is important to consider error-inducing test cases when evaluating software. To support data-driven creation of such a test-suite, which is especially of interest for testing software synthesized from large language models, we curate a dataset (Codehacks) of programming problems together with corresponding error-inducing test cases (i.e., "hacks"). This dataset is collected from the wild, in particular, from the Codeforces online judge platform. The dataset comprises 288,617 hacks for 5,578 programming problems, each with a natural language description, as well as the source code for 2,196 submitted solutions to these problems that can be broken with their corresponding hacks. Keywords: competitive programming, language model, dataset

翻译：软件广泛应用于日常生活中的关键领域，确保其正确性至关重要。评估软件正确性的一种常用方法是通过测试用例进行验证。若测试失败，则表明被测软件存在缺陷；若所有测试均通过，则可推定软件正确。然而，该结论的可靠性取决于所采用的测试集，且存在假阴性风险（即软件通过所有可用测试却因未覆盖特定场景而包含错误）。因此，在评估软件时考虑能够诱发错误的测试用例具有重要意义。为支持数据驱动的测试集构建（这对测试基于大语言模型生成的软件尤为重要），我们构建了包含编程问题及对应错误诱发测试用例（即“破解测试”）的数据集（Codehacks）。该数据集采集自真实环境，特别是Codeforces在线评测平台。数据集涵盖5,578个编程问题的288,617个破解测试，每个问题均附有自然语言描述，同时包含2,196个可被对应破解测试攻破的已提交解决方案源代码。关键词：编程竞赛，语言模型，数据集

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日