ChatGPT is one of the most popular language models which achieve amazing performance on various natural language tasks. Consequently, there is also an urgent need to detect the texts generated ChatGPT from human written. One of the extensively studied methods trains classification models to distinguish both. However, existing studies also demonstrate that the trained models may suffer from distribution shifts (during test), i.e., they are ineffective to predict the generated texts from unseen language tasks or topics. In this work, we aim to have a comprehensive investigation on these methods' generalization behaviors under distribution shift caused by a wide range of factors, including prompts, text lengths, topics, and language tasks. To achieve this goal, we first collect a new dataset with human and ChatGPT texts, and then we conduct extensive studies on the collected dataset. Our studies unveil insightful findings which provide guidance for developing future methodologies or data collection strategies for ChatGPT detection.
翻译:ChatGPT是目前最流行的语言模型之一,在各种自然语言任务中均展现出卓越性能。因此,区分ChatGPT生成文本与人类撰写文本的需求也日益迫切。一类被广泛研究的方法是通过训练分类模型来区分两者。然而,现有研究也表明,这类训练模型可能面临(测试期间的)分布偏移问题,即难以有效预测来自未见语言任务或主题的生成文本。本研究旨在系统探究这些方法在由提示词、文本长度、主题和语言任务等多因素导致的分布偏移下的泛化行为。为此,我们首先构建了包含人类与ChatGPT文本的新数据集,并基于该数据集开展了广泛实验。研究揭示的深刻见解将为未来ChatGPT检测方法论或数据收集策略的制定提供指导。