The ease of using a Large Language Model (LLM) to answer a wide variety of queries and their high availability has resulted in LLMs getting integrated into various applications. LLM-based recommenders are now routinely used by students as well as professional software programmers for code generation and testing. Though LLM-based technology has proven useful, its unethical and unattributed use by students and professionals is a growing cause of concern. As such, there is a need for tools and technologies which may assist teachers and other evaluators in identifying whether any portion of a source code is LLM generated. In this paper, we propose a neural network-based tool that instructors can use to determine the original effort (and LLM's contribution) put by students in writing source codes. Our tool is motivated by minimum description length measures like Kolmogorov complexity. Our initial experiments with moderate sized (up to 500 lines of code) have shown promising results that we report in this paper.
翻译:大型语言模型(LLM)因其易于回答各类查询且高度可用,已被集成到众多应用中。基于LLM的推荐系统现已成为学生和专业软件程序员日常编写代码与测试的常用工具。尽管LLM技术已被证明有效,但其被学生和专业人员不当使用且未注明来源的问题日益引发关注。为此,亟需开发相关工具与技术,以帮助教师及其他评估者识别源代码中是否存在由LLM生成的部分。本文提出一种基于神经网络的工具,可令教学人员判定学生在编写源代码时的原创投入(以及LLM的贡献程度)。该工具的设计灵感来源于柯尔莫哥洛夫复杂性等最小描述长度度量。我们针对中等规模(代码行数不超过500行)的初步实验已取得令人鼓舞的结果,具体内容将在本文中报告。