CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations

Recent work has aimed to capture nuances of human behavior by using LLMs to simulate responses from particular demographics in settings like social science experiments and public opinion surveys. However, there are currently no established ways to discuss or evaluate the quality of such LLM simulations. Moreover, there is growing concern that these LLM simulations are flattened caricatures of the personas that they aim to simulate, failing to capture the multidimensionality of people and perpetuating stereotypes. To bridge these gaps, we present CoMPosT, a framework to characterize LLM simulations using four dimensions: Context, Model, Persona, and Topic. We use this framework to measure open-ended LLM simulations' susceptibility to caricature, defined via two criteria: individuation and exaggeration. We evaluate the level of caricature in scenarios from existing work on LLM simulations. We find that for GPT-4, simulations of certain demographics (political and marginalized groups) and topics (general, uncontroversial) are highly susceptible to caricature.

翻译：近期研究旨在通过利用大语言模型模拟社会科学实验和公众舆论调查等场景中特定人群的响应，从而捕捉人类行为的细微差异。然而，目前尚未建立讨论或评估此类大语言模型模拟质量的标准化方法。此外，日益增长的担忧表明，这些模拟可能沦为所模拟角色扁平化的漫画式刻画，既未能体现人类的多维性，又加剧了刻板印象。为弥合这些空白，我们提出CoMPosT框架，通过四个维度（上下文、模型、角色、主题）对大语言模型模拟进行表征。我们运用该框架，依据个体化与夸张化两项标准定义漫画化倾向，以衡量开放式大语言模型模拟的脆弱性。我们评估了现有大语言模型模拟研究中场景的漫画化程度，发现对于GPT-4模型，特定人群（政治群体与边缘群体）及话题（通用型、非争议性话题）的模拟具有高度漫画化倾向。

相关内容

AIM

关注 660

医学人工智能AIM（Artificial Intelligence in Medicine）杂志发表了多学科领域的原创文章，涉及医学中的人工智能理论和实践，以医学为导向的人类生物学和卫生保健。医学中的人工智能可以被描述为与研究、项目和应用相关的科学学科，旨在通过基于知识或数据密集型的计算机解决方案支持基于决策的医疗任务，最终支持和改善人类护理提供者的性能。官网地址：http://dblp.uni-trier.de/db/journals/artmed/

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日