Significant progress has been made on text generation by pre-trained language models (PLMs), yet distinguishing between human and machine-generated text poses an escalating challenge. This paper offers an in-depth evaluation of three distinct methods used to address this task: traditional shallow learning, Language Model (LM) fine-tuning, and Multilingual Model fine-tuning. These approaches are rigorously tested on a wide range of machine-generated texts, providing a benchmark of their competence in distinguishing between human-authored and machine-authored linguistic constructs. The results reveal considerable differences in performance across methods, thus emphasizing the continued need for advancement in this crucial area of NLP. This study offers valuable insights and paves the way for future research aimed at creating robust and highly discriminative models.
翻译:预训练语言模型在文本生成方面取得了显著进展,然而区分人类与机器生成文本的挑战日益加剧。本文深入评估了解决该任务的三种不同方法:传统浅层学习、语言模型微调以及多语言模型微调。这些方法在多种机器生成文本上进行了严格测试,提供了它们在区分人类作者与机器作者语言构造能力方面的基准。结果显示,不同方法在性能上存在显著差异,从而强调了在自然语言处理这一关键领域持续改进的必要性。本研究提供了宝贵见解,并为未来旨在创建稳健且高判别性模型的研究铺平了道路。