Significant progress has been made on text generation by pre-trained language models (PLMs), yet distinguishing between human and machine-generated text poses an escalating challenge. This paper offers an in-depth evaluation of three distinct methods used to address this task: traditional shallow learning, Language Model (LM) fine-tuning, and Multilingual Model fine-tuning. These approaches are rigorously tested on a wide range of machine-generated texts, providing a benchmark of their competence in distinguishing between human-authored and machine-authored linguistic constructs. The results reveal considerable differences in performance across methods, thus emphasizing the continued need for advancement in this crucial area of NLP. This study offers valuable insights and paves the way for future research aimed at creating robust and highly discriminative models.
翻译:预训练语言模型在文本生成方面取得了显著进展,然而区分人类与机器生成文本仍构成日益严峻的挑战。本文对用于解决该任务的三种不同方法进行了深入评估:传统浅层学习、语言模型微调以及多语言模型微调。这些方法在广泛类型的机器生成文本上进行了严格测试,建立了它们在区分人类创作与机器生成语言结构能力方面的基准。研究结果显示不同方法在性能上存在显著差异,从而强调了自然语言处理这一关键领域持续进步的必要性。本研究提供了宝贵见解,并为未来旨在创建鲁棒性强且具备高区分能力的模型的研究铺平了道路。