M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection

Yuxia Wang,Jonibek Mansurov,Petar Ivanov,Jinyan Su,Artem Shelmanov,Akim Tsvigun,Osama Mohanned Afzal,Tarek Mahmoud,Giovanni Puccetti,Thomas Arnold,Alham Fikri Aji,Nizar Habash,Iryna Gurevych,Preslav Nakov

from arxiv, 29 pages

The advent of Large Language Models (LLMs) has brought an unprecedented surge in machine-generated text (MGT) across diverse channels. This raises legitimate concerns about its potential misuse and societal implications. The need to identify and differentiate such content from genuine human-generated text is critical in combating disinformation, preserving the integrity of education and scientific fields, and maintaining trust in communication. In this work, we address this problem by introducing a new benchmark based on a multilingual, multi-domain, and multi-generator corpus of MGTs -- M4GT-Bench. The benchmark is compiled of three tasks: (1) mono-lingual and multi-lingual binary MGT detection; (2) multi-way detection where one need to identify, which particular model generated the text; and (3) mixed human-machine text detection, where a word boundary delimiting MGT from human-written content should be determined. On the developed benchmark, we have tested several MGT detection baselines and also conducted an evaluation of human performance. We see that obtaining good performance in MGT detection usually requires an access to the training data from the same domain and generators. The benchmark is available at https://github.com/mbzuai-nlp/M4GT-Bench.

翻译：大型语言模型（LLM）的出现带来了机器生成文本（MGT）在各渠道前所未有的激增，这引发了对其潜在滥用和社会影响的合理担忧。在打击虚假信息、维护教育和科学领域的完整性以及保持沟通信任方面，识别并区分此类内容与真实人类生成文本至关重要。在本工作中，我们通过引入一个基于多语言、多领域、多生成器的MGT语料库的新基准——M4GT-Bench——来解决这一问题。该基准包含三项任务：（1）单语言与多语言二元MGT检测；（2）多路检测，即需要识别文本具体由哪个模型生成；（3）人机混合文本检测，即需确定划分MGT与人类撰写内容的词边界。在所开发的基准上，我们测试了若干MGT检测基线方法，并评估了人类表现。我们发现，要在MGT检测中取得良好性能，通常需要获取来自相同领域和生成器的训练数据。该基准可通过 https://github.com/mbzuai-nlp/M4GT-Bench 获取。

相关内容

黑盒

关注 1

在科学，计算和工程学中，黑盒是一种设备，系统或对象，可以根据其输入和输出（或传输特性）对其进行查看，而无需对其内部工作有任何了解。它的实现是“不透明的”（黑色）。几乎任何事物都可以被称为黑盒：晶体管，引擎，算法，人脑，机构或政府。为了使用典型的“黑匣子方法”来分析建模为开放系统的事物，仅考虑刺激/响应的行为，以推断（未知）盒子。该黑匣子系统的通常表示形式是在该方框中居中的数据流程图。黑盒的对立面是一个内部组件或逻辑可用于检查的系统，通常将其称为白盒（有时也称为“透明盒”或“玻璃盒”）。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日