Significant progress has been made on text generation by pre-trained language models (PLMs), yet distinguishing between human and machine-generated text poses an escalating challenge. This paper offers an in-depth evaluation of three distinct methods used to address this task: traditional shallow learning, Language Model (LM) fine-tuning, and Multilingual Model fine-tuning. These approaches are rigorously tested on a wide range of machine-generated texts, providing a benchmark of their competence in distinguishing between human-authored and machine-authored linguistic constructs. The results reveal considerable differences in performance across methods, thus emphasizing the continued need for advancement in this crucial area of NLP. This study offers valuable insights and paves the way for future research aimed at creating robust and highly discriminative models.
翻译:预训练语言模型在文本生成方面取得了显著进展,然而区分人类与机器生成文本的挑战日益加剧。本文对解决该任务的三种不同方法进行了深入评估:传统浅层学习、语言模型微调以及多语言模型微调。这些方法在多种机器生成文本上接受了严格测试,为评估其在区分人类创作与机器生成语言结构方面的能力提供了基准。结果显示,不同方法在性能上存在显著差异,从而强调了在自然语言处理这一关键领域持续进步的必要性。本研究提供了宝贵见解,并为未来旨在创建稳健且高判别性模型的研究铺平了道路。