Large language models (LLMs) are increasingly capable of generating personalized, persuasive text at scale, raising new questions about bias and fairness in automated communication. This paper presents the first systematic analysis of how LLMs behave when tasked with demographic-conditioned targeted messaging. We introduce a controlled evaluation framework using three leading models: GPT-4o, Llama-3.3, and Mistral-Large-2.1, across two generation settings: Standalone Generation, which isolates intrinsic demographic effects, and Context-Rich Generation, which incorporates thematic and regional context to emulate realistic targeting. We evaluate generated messages along three dimensions: lexical content, language style, and persuasive framing. We instantiate this framework on climate communication and find consistent age- and gender-based asymmetries across models: male- and youth-targeted messages emphasize agency, innovation, and assertiveness, while female- and senior-targeted messages stress warmth, care, and tradition. Contextual prompts systematically amplify these disparities, with persuasion scores significantly higher for messages tailored to younger or male audiences. Our findings demonstrate how demographic stereotypes can surface and intensify in LLM-generated targeted communication, underscoring the need for bias-aware generation pipelines and transparent auditing frameworks that explicitly account for demographic conditioning in socially sensitive applications.
翻译:大语言模型(LLMs)越来越擅长大规模生成个性化、有说服力的文本,这引发了自动通信中偏见与公平性的新问题。本文首次系统分析了LLMs在执行群体条件定向信息任务时的行为表现。我们引入了一个受控评估框架,使用了三种领先模型:GPT-4o、Llama-3.3和Mistral-Large-2.1,并在两种生成场景下进行测试:独立生成(隔离内在群体效应)和背景丰富生成(融入主题和地域背景以模拟现实定向)。我们沿三个维度评估生成的文本:词汇内容、语言风格和说服框架。以气候传播为案例实施该框架后,我们发现各模型在年龄和性别维度上存在一致的不对称性:面向男性和年轻群体的信息强调能动性、创新和坚定,而面向女性和老年群体的信息则强调温暖、关怀和传统。背景提示系统地放大了这些差异,针对年轻或男性受众定制信息时,说服力得分显著更高。我们的研究结果表明,群体刻板印象可能在大语言模型生成的定向传播中浮现并加剧,这突显了在高度社会敏感性应用中,需要采用感知偏见的生成流水线和透明的审计框架,明确考虑群体条件因素。