Recent advances in natural language processing (NLP) have led to the development of large language models (LLMs) such as ChatGPT. This paper proposes a methodology for developing and evaluating ChatGPT detectors for French text, with a focus on investigating their robustness on out-of-domain data and against common attack schemes. The proposed method involves translating an English dataset into French and training a classifier on the translated data. Results show that the detectors can effectively detect ChatGPT-generated text, with a degree of robustness against basic attack techniques in in-domain settings. However, vulnerabilities are evident in out-of-domain contexts, highlighting the challenge of detecting adversarial text. The study emphasizes caution when applying in-domain testing results to a wider variety of content. We provide our translated datasets and models as open-source resources. https://gitlab.inria.fr/wantoun/robust-chatgpt-detection
翻译:自然语言处理(NLP)的最新进展催生了诸如 ChatGPT 等大型语言模型(LLM)的开发。本文提出了一种面向法语文本的 ChatGPT 检测器开发与评估方法论,重点研究其在域外数据及常见攻击方案下的鲁棒性。所提方法涉及将英文数据集翻译为法语,并基于翻译数据训练分类器。结果表明,检测器能有效识别 ChatGPT 生成的文本,且在域内场景中对基本攻击技术具备一定鲁棒性。然而,在域外语境中,漏洞明显存在,凸显了对抗性文本检测的挑战。本研究强调,将域内测试结果直接推广至更广泛内容时需谨慎。我们提供翻译后的数据集与模型作为开源资源:https://gitlab.inria.fr/wantoun/robust-chatgpt-detection