Recently, large language models (LLMs) are capable of generating highly fluent textual content. While they offer significant convenience to humans, they also introduce various risks, like phishing and academic dishonesty. Numerous research efforts have been dedicated to developing algorithms for detecting AI-generated text and constructing relevant datasets. However, in the domain of Chinese corpora, challenges remain, including limited model diversity and data homogeneity. To address these issues, we propose C-ReD: a comprehensive Chinese Real-prompt AI-generated Detection benchmark. Experiments demonstrate that C-ReD not only enables reliable in-domain detection but also supports strong generalization to unseen LLMs and external Chinese datasets-addressing critical gaps in model diversity, domain coverage, and prompt realism that have limited prior Chinese detection benchmarks. We release our resources at https://github.com/HeraldofLight/C-ReD.
翻译:近期,大语言模型能够生成高度流畅的文本内容。虽然它们为人类带来了显著便利,但也引入了多种风险,如网络钓鱼和学术不端。大量研究致力于开发检测AI生成文本的算法及构建相关数据集。然而,在中文语料领域,仍存在模型多样性不足和数据同质性等挑战。为解决这些问题,我们提出了C-ReD:一个全面的中文真实提示AI生成检测基准。实验表明,C-ReD不仅支持可靠的域内检测,还能泛化至未见大语言模型及外部中文数据集——有效弥补了先前中文检测基准在模型多样性、领域覆盖和提示真实性方面的关键缺陷。相关资源已发布于https://github.com/HeraldofLight/C-ReD。