The rapid development and deployment of Generative AI in social settings raise important questions about how to optimally personalize them for users while maintaining accuracy and realism. Based on a Facebook public post-comment dataset, this study evaluates the ability of Llama 3.0 (70B) to predict the semantic tones across different combinations of a commenter's and poster's gender, age, and friendship closeness and to replicate these differences in LLM-generated comments. The study consists of two parts: Part I assesses differences in semantic tones across social relationship categories, and Part II examines the similarity between comments generated by Llama 3.0 (70B) and human comments from Part I given public Facebook posts as input. Part I results show that including social relationship information improves the ability of a model to predict the semantic tone of human comments. However, Part II results show that even without including social context information in the prompt, LLM-generated comments and human comments are equally sensitive to social context, suggesting that LLMs can comprehend semantics from the original post alone. When we include all social relationship information in the prompt, the similarity between human comments and LLM-generated comments decreases. This inconsistency may occur because LLMs did not include social context information as part of their training data. Together these results demonstrate the ability of LLMs to comprehend semantics from the original post and respond similarly to human comments, but also highlights their limitations in generalizing personalized comments through prompting alone.
翻译:生成式人工智能在社交环境中的快速发展和部署引发了一个重要问题:如何在保持准确性和真实性的同时,为用户实现最优个性化。本研究基于一个Facebook公开帖子-评论数据集,评估了Llama 3.0 (70B)在预测不同评论者与发帖者性别、年龄及友谊亲密度组合下的语义情感方面的能力,以及在其生成的评论中复现这些差异的能力。研究包含两部分:第一部分评估不同社会关系类别间语义情感的差异;第二部分在给定公开Facebook帖子作为输入的情况下,检验Llama 3.0 (70B)生成的评论与第一部分中人类评论的相似性。第一部分结果表明,包含社会关系信息能提升模型预测人类评论语义情感的能力。然而,第二部分结果显示,即使在提示中不包含社交情境信息,LLM生成的评论与人类评论对社交情境的敏感度相当,这表明LLM仅从原始帖子中即可理解语义。当我们在提示中包含所有社会关系信息时,人类评论与LLM生成评论之间的相似性反而下降。这种不一致性可能源于LLM的训练数据未包含社交情境信息。这些结果共同证明了LLM从原始帖子中理解语义并能以类似人类评论的方式回应的能力,同时也凸显了其仅通过提示来泛化个性化评论的局限性。