Decoding AI Authorship: Can LLMs Truly Mimic Human Style Across Literature and Politics?

Amidst the rising capabilities of generative AI to mimic specific human styles, this study investigates the ability of state-of-the-art large language models (LLMs), including GPT-4o, Gemini 1.5 Pro, and Claude Sonnet 3.5, to emulate the authorial signatures of prominent literary and political figures: Walt Whitman, William Wordsworth, Donald Trump, and Barack Obama. Utilizing a zero-shot prompting framework with strict thematic alignment, we generated synthetic corpora evaluated through a complementary framework combining transformer-based classification (BERT) and interpretable machine learning (XGBoost). Our methodology integrates Linguistic Inquiry and Word Count (LIWC) markers, perplexity, and readability indices to assess the divergence between AI-generated and human-authored text. Results demonstrate that AI-generated mimicry remains highly detectable, with XGBoost models trained on a restricted set of eight stylometric features achieving accuracy comparable to high-dimensional neural classifiers. Feature importance analyses identify perplexity as the primary discriminative metric, revealing a significant divergence in the stochastic regularity of AI outputs compared to the higher variability of human writing. While LLMs exhibit distributional convergence with human authors on low-dimensional heuristic features, such as syntactic complexity and readability, they do not yet fully replicate the nuanced affective density and stylistic variance inherent in the human-authored corpus. By isolating the specific statistical gaps in current generative mimicry, this study provides a comprehensive benchmark for LLM stylistic behavior and offers critical insights for authorship attribution in the digital humanities and social media.

翻译：随着生成式AI模仿特定人类风格能力的提升，本研究考察了包括GPT-4o、Gemini 1.5 Pro和Claude Sonnet 3.5在内的最先进大语言模型（LLMs）模仿著名文学与政治人物（沃尔特·惠特曼、威廉·华兹华斯、唐纳德·特朗普、巴拉克·奥巴马）作者特征的能力。采用严格主题对齐的零样本提示框架，我们生成了合成语料库，并通过结合基于Transformer的分类器（BERT）与可解释机器学习模型（XGBoost）的互补框架进行评估。研究方法整合了语言查询与词计数（LIWC）标记、困惑度与可读性指数，以评估AI生成文本与人类撰写文本之间的差异。结果表明，AI生成的模仿仍高度可检测，基于八项文体特征子集训练的XGBoost模型准确率可与高维神经分类器媲美。特征重要性分析将困惑度识别为关键判别指标，揭示出AI输出在统计规律性方面与人类写作更高变异性的显著差异。虽然LLMs在句法复杂度与可读性等低维启发式特征上展现出与人类作者分布收敛的特性，但尚未完全复现人类作者语料库中固有的微妙情感密度与风格差异。通过剖析当前生成式模仿中具体的统计缺口，本研究为LLM风格行为提供了综合基准，并为数字人文与社交媒体领域的作者身份归属提供了关键见解。