Large Language Models (LLMs) have had considerable difficulty when prompted with mathematical questions, especially those within theory of computing (ToC) courses. In this paper, we detail two experiments regarding our own ToC course and the ChatGPT LLM. For the first, we evaluated ChatGPT's ability to pass our own ToC course's exams. For the second, we created a database of sample ToC questions and responses to accommodate other ToC offerings' choices for topics and structure. We scored each of ChatGPT's outputs on these questions. Overall, we determined that ChatGPT can pass our ToC course, and is adequate at understanding common formal definitions and answering "simple"-style questions, e.g., true/false and multiple choice. However, ChatGPT often makes nonsensical claims in open-ended responses, such as proofs.
翻译:大型语言模型(LLMs)在处理数学问题时一直面临显著困难,尤其是在计算理论(ToC)课程所涵盖的范围内。本文详细阐述了我们针对自身计算理论课程与ChatGPT LLM开展的两项实验。在第一项实验中,我们评估了ChatGPT通过我们计算理论课程考试的能力。在第二项实验中,我们构建了一个包含典型计算理论问题及其参考答案的数据库,以适应不同计算理论课程在主题与结构上的差异。我们对ChatGPT在这些问题上的所有输出进行了评分。总体而言,我们认定ChatGPT能够通过我们的计算理论课程,并且在理解常见形式化定义及回答“简单”类型问题(例如判断题与选择题)方面表现合格。然而,在开放式回答(如证明题)中,ChatGPT经常提出缺乏逻辑的论断。