This paper presents the overview of the AuTexTification shared task as part of the IberLEF 2023 Workshop in Iberian Languages Evaluation Forum, within the framework of the SEPLN 2023 conference. AuTexTification consists of two subtasks: for Subtask 1, participants had to determine whether a text is human-authored or has been generated by a large language model. For Subtask 2, participants had to attribute a machine-generated text to one of six different text generation models. Our AuTexTification 2023 dataset contains more than 160.000 texts across two languages (English and Spanish) and five domains (tweets, reviews, news, legal, and how-to articles). A total of 114 teams signed up to participate, of which 36 sent 175 runs, and 20 of them sent their working notes. In this overview, we present the AuTexTification dataset and task, the submitted participating systems, and the results.
翻译:本文综述了作为 IberLEF 2023 讲习班(伊比利亚语言评估论坛)一部分的 AuTexTification 共享任务,该任务在 SEPLN 2023 会议框架内进行。AuTexTification 包含两个子任务:子任务 1 中,参与者需判断文本是由人类撰写还是由大型语言模型生成;子任务 2 中,参与者需将机器生成的文本归因于六种不同的文本生成模型之一。我们的 AuTexTification 2023 数据集包含超过 160,000 条文本,涵盖两种语言(英语和西班牙语)和五个领域(推文、评论、新闻、法律及操作指南)。共有 114 支团队注册参与,其中 36 支提交了 175 次运行结果,20 支提交了工作报告。本文中,我们介绍了 AuTexTification 数据集与任务、已提交的参与系统以及相关结果。