Large language models (LLMs) are increasingly capable of generating personalized, persuasive text at scale, raising new questions about bias and fairness in automated communication. This paper presents the first systematic analysis of how LLMs behave when tasked with demographic-conditioned targeted messaging. We introduce a controlled evaluation framework using three leading models -- GPT-4o, Llama-3.3, and Mistral-Large 2.1 -- across two generation settings: Standalone Generation, which isolates intrinsic demographic effects, and Context-Rich Generation, which incorporates thematic and regional context to emulate realistic targeting. We evaluate generated messages along three dimensions: lexical content, language style, and persuasive framing. We instantiate this framework on climate communication and find consistent age- and gender-based asymmetries across models: male- and youth-targeted messages emphasize agency, innovation, and assertiveness, while female- and senior-targeted messages stress warmth, care, and tradition. Contextual prompts systematically amplify these disparities, with persuasion scores significantly higher for messages tailored to younger or male audiences. Our findings demonstrate how demographic stereotypes can surface and intensify in LLM-generated targeted communication, underscoring the need for bias-aware generation pipelines and transparent auditing frameworks that explicitly account for demographic conditioning in socially sensitive applications.
翻译:大语言模型(LLM)日益能够大规模生成个性化、具有说服力的文本,这引发了关于自动化通信中偏见与公平性的新问题。本文首次系统分析了LLM在执行基于人口统计条件的定向信息生成任务时的行为。我们引入一个受控评估框架,使用三个领先模型——GPT-4o、Llama-3.3和Mistral-Large 2.1——在两种生成设置下进行评估:独立生成(用于隔离内在的人口统计效应)和情境丰富生成(融入主题与地域背景以模拟现实定向场景)。我们从三个维度评估生成的信息:词汇内容、语言风格和说服性框架。我们将该框架实例化于气候传播领域,发现所有模型均存在一致的基于年龄和性别的非对称性:针对男性和年轻受众的信息强调能动性、创新性和自信,而针对女性和年长受众的信息则强调温暖、关怀和传统。情境提示会系统性放大这些差异,针对年轻或男性受众定制信息的说服力得分显著更高。我们的研究结果表明,人口统计刻板印象如何在LLM生成的定向传播中显现并加剧,这强调了在社会敏感应用中,需要建立能够明确考虑人口统计条件的、具有偏见意识的生成流程和透明审计框架。