Is AI Catching Up to Human Expression? Exploring Emotion, Personality, Authorship, and Linguistic Style in English and Arabic with Six Large Language Models

翻译：AI是否正在追赶人类表达？探索六种大型语言模型在英语与阿拉伯语中的情感、个性、作者身份及语言风格

Nasser A Alsadhan

from arxiv, Preprint. Under review

The advancing fluency of LLMs raises important questions about their ability to emulate complex human traits, including emotional expression and personality, across diverse linguistic and cultural contexts. This study investigates whether LLMs can convincingly mimic emotional nuance in English and personality markers in Arabic, a critical under-resourced language with unique linguistic and cultural characteristics. We conduct two tasks across six models:Jais, Mistral, LLaMA, GPT-4o, Gemini, and DeepSeek. First, we evaluate whether machine classifiers can reliably distinguish between human-authored and AI-generated texts. Second, we assess the extent to which LLM-generated texts exhibit emotional or personality traits comparable to those of humans. Our results demonstrate that AI-generated texts are distinguishable from human-authored ones (F1>0.95), though classification performance deteriorates on paraphrased samples, indicating a reliance on superficial stylistic cues. Emotion and personality classification experiments reveal significant generalization gaps: classifiers trained on human data perform poorly on AI-generated texts and vice versa, suggesting LLMs encode affective signals differently from humans. Importantly, augmenting training with AI-generated data enhances performance in the Arabic personality classification task, highlighting the potential of synthetic data to address challenges in under-resourced languages. Model-specific analyses show that GPT-4o and Gemini exhibit superior affective coherence. Linguistic and psycholinguistic analyses reveal measurable divergences in tone, authenticity, and textual complexity between human and AI texts. These findings have implications for affective computing, authorship attribution, and responsible AI deployment, particularly within underresourced language contexts where generative AI detection and alignment pose unique challenges.

翻译：大型语言模型（LLM）流畅性的提升引发重要问题：它们能否在不同语言和文化背景下，可靠地模仿复杂的人类特征，包括情感表达与个性特征？本研究探究LLM是否能在英语中令人信服地模拟情感细微差异，以及在阿拉伯语（一种具有独特语言文化特征但资源严重匮乏的关键语言）中模拟个性标记。我们基于六种模型（Jais、Mistral、LLaMA、GPT-4o、Gemini与DeepSeek）开展两项任务：首先，评估机器分类器能否可靠区分人类撰写文本与AI生成文本；其次，评估LLM生成文本所表现的情感或个性特征在多大程度上可与人类相媲美。结果表明，AI生成文本与人类撰写文本具有可区分性（F1 > 0.95），但在经改写处理的样本上分类性能下降，表明其依赖表层风格线索。情感与个性分类实验揭示了显著的泛化鸿沟：基于人类数据训练的分类器对AI生成文本表现不佳，反之亦然，表明LLM编码情感信号的方式与人类存在差异。值得注意的是，在阿拉伯语个性分类任务中，采用AI生成数据增强训练可提升性能，凸显合成数据在解决资源匮乏语言挑战中的潜力。模型特定分析显示，GPT-4o与Gemini表现出更优的情感连贯性。语言与心理语言学分析揭示了人类文本与AI文本在语气、真实性与文本复杂性上的可测量差异。这些发现对情感计算、作者身份归属及负责任AI部署具有启示意义，尤其在生成式AI检测与对齐面临独特挑战的资源匮乏语言环境中。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

迈向个性化大语言模型驱动的智能体：基础、评估与未来方向

专知会员服务

28+阅读 · 2月27日

基于大型语言模型的人机系统综述

专知会员服务

26+阅读 · 2025年5月12日

个性化大型语言模型综述：进展与未来方向

专知会员服务

43+阅读 · 2025年2月18日

《语音大语言模型》最新进展综述

专知会员服务

58+阅读 · 2024年10月8日