Large language models (LLMs) have shown promise in representing individuals and communities, offering new ways to study complex social dynamics. However, effectively aligning LLMs with specific human groups and systematically assessing the fidelity of the alignment remains a challenge. This paper presents a robust framework for aligning LLMs with online communities via instruction-tuning and comprehensively evaluating alignment across various aspects of language, including authenticity, emotional tone, toxicity, and harm. We demonstrate the utility of our approach by applying it to online communities centered on dieting and body image. We administer an eating disorder psychometric test to the aligned LLMs to reveal unhealthy beliefs and successfully differentiate communities with varying levels of eating disorder risk. Our results highlight the potential of LLMs in automated moderation and broader applications in public health and social science research.
翻译:大语言模型(LLMs)在代表个体和群体方面展现出潜力,为研究复杂社会动态提供了新途径。然而,如何有效将LLMs与特定人类群体对齐,并系统评估对齐的保真度,仍是一个挑战。本文提出一个稳健的框架,通过指令微调将LLMs与在线社区对齐,并从语言真实性、情感基调、毒性及危害性等多个维度全面评估对齐效果。我们以饮食与身体形象为中心的在线社区为例,展示了该方法的实用性。我们对对齐后的LLMs实施饮食失调心理测量测试,以揭示其不健康的信念,并成功区分了具有不同饮食失调风险水平的社区。我们的研究结果突显了LLMs在自动内容审核以及公共卫生与社会科学研究等更广泛应用中的潜力。