Large language models (LLMs) such as ChatGPT are increasingly being used for various use cases, including text content generation at scale. Although detection methods for such AI-generated text exist already, we investigate ChatGPT's performance as a detector on such AI-generated text, inspired by works that use ChatGPT as a data labeler or annotator. We evaluate the zero-shot performance of ChatGPT in the task of human-written vs. AI-generated text detection, and perform experiments on publicly available datasets. We empirically investigate if ChatGPT is symmetrically effective in detecting AI-generated or human-written text. Our findings provide insight on how ChatGPT and similar LLMs may be leveraged in automated detection pipelines by simply focusing on solving a specific aspect of the problem and deriving the rest from that solution. All code and data is available at \url{https://github.com/AmritaBh/ChatGPT-as-Detector}.
翻译:大型语言模型(LLMs)如ChatGPT正被越来越多地应用于各种场景,包括大规模文本内容生成。尽管已有针对此类AI生成文本的检测方法,我们受将ChatGPT用作数据标注器或注释器相关研究的启发,探究了ChatGPT作为检测器在检测AI生成文本时的表现。我们评估了ChatGPT在区分人类书写文本与AI生成文本任务中的零样本(zero-shot)性能,并在公开数据集上进行了实验。通过实证研究,我们探讨了ChatGPT在检测AI生成文本或人类书写文本时是否具有对称有效性。研究结果表明,只需专注于解决问题的特定方面并由此推导其余部分,ChatGPT及类似大型语言模型即可被有效集成到自动化检测流程中。所有代码与数据均可在\url{https://github.com/AmritaBh/ChatGPT-as-Detector}获取。