Analyzing the worst-case time complexity of a code is a crucial task in computer science and software engineering for ensuring the efficiency, reliability, and robustness of software systems. However, it is well-known that the problem of determining the worst-case time complexity of a given code written in general-purpose programming language is theoretically undecidable by the famous Halting problem proven by Alan Turing. Thus, we move towards more realistic scenarios where the inputs and outputs of a program exist. This allows us to discern the correctness of given codes, challenging to analyze their time complexity exhaustively. In response to this challenge, we introduce CodeComplex, a novel source code dataset where each code is manually annotated with a corresponding worst-case time complexity. CodeComplex comprises 4,900 Java codes and an equivalent number of Python codes, all sourced from programming competitions and annotated with complexity labels by a panel of algorithmic experts. To the best of our knowledge, CodeComplex stands as the most extensive code dataset tailored for predicting complexity. Subsequently, we present the outcomes of our experiments employing various baseline models, leveraging state-of-the-art neural models in code comprehension like CodeBERT, GraphCodeBERT, UniXcoder, PLBART, CodeT5, CodeT5+, and ChatGPT. We analyze how the dataset impacts the model's learning in predicting time complexity.
翻译:分析代码的最坏情况时间复杂度是计算机科学与软件工程中的关键任务,旨在确保软件系统的效率、可靠性和鲁棒性。然而,众所周知,由于艾伦·图灵证明的著名停机问题,在通用编程语言中判断给定代码的最坏情况时间复杂度在理论上不可判定。因此,我们转向更实际的场景(即程序存在输入与输出),这使我们能够辨别给定代码的正确性,但难以详尽分析其时间复杂度。针对这一挑战,我们提出了CodeComplex——一个新颖的源代码数据集,其中每段代码均由人工标注对应的最坏情况时间复杂度。CodeComplex包含4,900个Java代码和同等数量的Python代码,所有代码均来自编程竞赛,并由算法专家团队标注复杂度标签。据我们所知,CodeComplex是当前面向复杂度预测的最庞大的代码数据集。随后,我们展示了采用多种基线模型的实验结果,这些模型利用了代码理解领域最先进的神经模型(如CodeBERT、GraphCodeBERT、UniXcoder、PLBART、CodeT5、CodeT5+及ChatGPT)。我们分析了该数据集如何影响模型在预测时间复杂度方面的学习效果。