MGTBench: Benchmarking Machine-Generated Text Detection

Nowadays large language models (LLMs) have shown revolutionary power in a variety of natural language processing (NLP) tasks such as text classification, sentiment analysis, language translation, and question-answering. In this way, detecting machine-generated texts (MGTs) is becoming increasingly important as LLMs become more advanced and prevalent. These models can generate human-like language that can be difficult to distinguish from text written by a human, which raises concerns about authenticity, accountability, and potential bias. However, existing detection methods against MGTs are evaluated under different model architectures, datasets, and experimental settings, resulting in a lack of a comprehensive evaluation framework across different methodologies In this paper, we fill this gap by proposing the first benchmark framework for MGT detection, named MGTBench. Extensive evaluations on public datasets with curated answers generated by ChatGPT (the most representative and powerful LLMs thus far) show that most of the current detection methods perform less satisfactorily against MGTs. An exceptional case is ChatGPT Detector, which is trained with ChatGPT-generated texts and shows great performance in detecting MGTs. Nonetheless, we note that only a small fraction of adversarial-crafted perturbations on MGTs can evade the ChatGPT Detector, thus highlighting the need for more robust MGT detection methods. We envision that MGTBench will serve as a benchmark tool to accelerate future investigations involving the evaluation of state-of-the-art MGT detection methods on their respective datasets and the development of more advanced MGT detection methods. Our source code and datasets are available at https://github.com/xinleihe/MGTBench.

翻译：当前，大型语言模型在文本分类、情感分析、语言翻译和问答等多种自然语言处理任务中展现出革命性能力。随着这类模型日益先进和普及，检测机器生成文本的重要性与日俱增。尽管这些模型能够生成与人类写作难以区分的类人语言，但其应用引发了关于真实性、责任归属及潜在偏见等问题。然而，现有针对机器生成文本的检测方法因采用不同的模型架构、数据集和实验设置进行评估，导致缺乏跨方法的统一评估框架。本文提出首个机器生成文本检测基准框架MGTBench，填补了这一空白。基于由ChatGPT（当前最具代表性和最强大的语言模型）生成的语料在公开数据集上的广泛评估表明，现有多数检测方法对机器生成文本的表现不尽如人意。值得注意的是，经ChatGPT生成文本训练的ChatGPT检测器在识别人工合成文本方面展现出卓越性能。但我们发现，仅需对机器生成文本施加极小比例的对抗性扰动，即可规避该检测器，这凸显了开发更鲁棒检测方法的迫切性。我们期望MGTBench能成为基准工具，加速未来对各类数据集上最先进检测方法的评估，并推动更先进检测方法的研究。相关代码及数据集已开源至https://github.com/xinleihe/MGTBench。