This paper presents a case study on the design, administration, post-processing, and evaluation of surveys on large language models (LLMs). It comprises two components: (1) A statistical method for eliciting beliefs encoded in LLMs. We introduce statistical measures and evaluation metrics that quantify the probability of an LLM "making a choice", the associated uncertainty, and the consistency of that choice. (2) We apply this method to study what moral beliefs are encoded in different LLMs, especially in ambiguous cases where the right choice is not obvious. We design a large-scale survey comprising 680 high-ambiguity moral scenarios (e.g., "Should I tell a white lie?") and 687 low-ambiguity moral scenarios (e.g., "Should I stop for a pedestrian on the road?"). Each scenario includes a description, two possible actions, and auxiliary labels indicating violated rules (e.g., "do not kill"). We administer the survey to 28 open- and closed-source LLMs. We find that (a) in unambiguous scenarios, most models "choose" actions that align with commonsense. In ambiguous cases, most models express uncertainty. (b) Some models are uncertain about choosing the commonsense action because their responses are sensitive to the question-wording. (c) Some models reflect clear preferences in ambiguous scenarios. Specifically, closed-source models tend to agree with each other.
翻译:本文呈现了一项关于大型语言模型(LLMs)调查的设计、实施、后处理与评估的案例研究。研究包含两个组成部分:(1)一种用于提取大语言模型中编码信念的统计方法。我们引入了量化大语言模型“做出选择”概率、相关不确定性以及选择一致性的统计度量和评估指标。(2)我们应用该方法研究不同大语言模型中编码了何种道德信念,尤其在正确选择不明确的模糊案例中。我们设计了一项大规模调查,包含680个高模糊性道德情景(例如“我应该说善意的谎言吗?”)和687个低模糊性道德情景(例如“我应该在路上为行人停车吗?”)。每个情景包含描述、两种可能行动以及指示违反规则(例如“不可杀人”)的辅助标签。我们对28个开源和闭源大语言模型进行了调查。研究发现:(a)在明确情景中,大多数模型“选择”符合常识的行动。在模糊案例中,大多数模型表达不确定性。(b)部分模型对选择常识性行动表现出不确定性,因其响应受问题措辞影响。(c)部分模型在模糊情景中反映出明确偏好。具体而言,闭源模型倾向于相互一致。