We conduct a quantitative analysis contrasting human-written English news text with comparable large language model (LLM) output from 4 LLMs from the LLaMa family. Our analysis spans several measurable linguistic dimensions, including morphological, syntactic, psychometric and sociolinguistic aspects. The results reveal various measurable differences between human and AI-generated texts. Among others, human texts exhibit more scattered sentence length distributions, a distinct use of dependency and constituent types, shorter constituents, and more aggressive emotions (fear, disgust) than LLM-generated texts. LLM outputs use more numbers, symbols and auxiliaries (suggesting objective language) than human texts, as well as more pronouns. The sexist bias prevalent in human text is also expressed by LLMs.
翻译:我们对人类撰写的英文新闻文本与来自LLaMa系列4个大语言模型(LLM)生成的同类文本进行了定量对比分析。分析涵盖多个可测量的语言学维度,包括形态、句法、心理计量以及社会语言学方面。研究结果揭示了人机文本之间存在多种可量化的差异。与其他发现相比,人类文本的句子长度分布更为分散,依赖关系和成分类型的运用存在显著差异,构成成分更短,且情感表达更为激烈(如恐惧、厌恶)。相较于人类文本,LLM输出使用了更多数字、符号和助动词(暗示客观性语言),代词使用频率也更高。人类文本中普遍存在的性别歧视偏见同样在LLM输出中得到了体现。