Large Language Models (LLMs) are now capable of generating highly fluent, human-like text. They enable many applications, but also raise concerns such as large scale spam, phishing, or academic misuse. While much work has focused on detecting LLM-generated text, only limited work has gone into understanding the stylistic differences between human-written and machine-generated text. In this work, we perform a large scale analysis of stylistic variation across human-written text and outputs from 11 LLMs spanning 8 different genres and 4 decoding strategies using Douglas Biber's set of lexicogrammatical and functional features. Our findings reveal insights that can guide intentional LLM usage. First, key linguistic differentiators of LLM-generated text seem robust to generation conditions (e.g., prompt settings to nudge them to generate human-like text, or availability of human-written text to continue the style); second, genre exerts a stronger influence on stylistic features than the source itself; third, chat variants of the models generally appear to be clustered together in stylistic space, and finally, model has a larger effect on the style than decoding strategy, with some exceptions. These results highlight the relative importance of model and genre over prompting and decoding strategies in shaping the stylistic behavior of machine-generated text.
翻译:大型语言模型(LLMs)目前能够生成高度流畅、类似人类的文本。它们支持许多应用,但也引发了大规模垃圾信息、网络钓鱼或学术滥用等担忧。尽管许多研究聚焦于检测LLM生成的文本,但只有少量工作关注人类撰写文本与机器生成文本之间的文体差异。在本研究中,我们利用Douglas Biber的词汇语法与功能特征集合,对人类撰写文本与11个LLM(涵盖8种不同体裁和4种解码策略)的输出进行了大规模文体变异分析。研究结果揭示了可指导LLM有意使用的见解。首先,LLM生成文本的关键语言区分因素似乎对生成条件(例如,提示设置以引导其生成类似人类的文本,或提供人类撰写文本以延续风格)具有鲁棒性;其次,体裁对文体特征的影响强于来源本身;第三,模型的聊天变体通常在文体空间中聚集在一起;最后,模型对文体的影响大于解码策略,但存在一些例外。这些结果突显了模型和体裁(而非提示和解码策略)在塑造机器生成文本的文体行为中的相对重要性。