Recent researches indicate that Pre-trained Large Language Models (LLMs) possess cognitive constructs similar to those observed in humans, prompting researchers to investigate the cognitive aspects of LLMs. This paper focuses on explicit and implicit social bias, a distinctive two-level cognitive construct in psychology. It posits that individuals' explicit social bias, which is their conscious expression of bias in the statements, may differ from their implicit social bias, which represents their unconscious bias. We propose a two-stage approach and discover a parallel phenomenon in LLMs known as "re-judge inconsistency" in social bias. In the initial stage, the LLM is tasked with automatically completing statements, potentially incorporating implicit social bias. However, in the subsequent stage, the same LLM re-judges the biased statement generated by itself but contradicts it. We propose that this re-judge inconsistency can be similar to the inconsistency between human's unaware implicit social bias and their aware explicit social bias. Experimental investigations on ChatGPT and GPT-4 concerning common gender biases examined in psychology corroborate the highly stable nature of the re-judge inconsistency. This finding may suggest that diverse cognitive constructs emerge as LLMs' capabilities strengthen. Consequently, leveraging psychological theories can provide enhanced insights into the underlying mechanisms governing the expressions of explicit and implicit constructs in LLMs.
翻译:近期研究表明,预训练大语言模型具有与人类相似的认知结构,这促使研究者探索其认知层面。本文聚焦于心理学中独特的双层认知结构——外显与内隐社会偏见,认为个体在语句中有意识表达的外显社会偏见,可能与其无意识的内隐社会偏见存在差异。我们提出两阶段方法,并在大语言模型中发现名为"重判不一致性"的平行现象。初始阶段,模型被要求自动完成语句填充,可能隐含内隐社会偏见;而在后续阶段,同一模型对其自身生成的偏见语句进行重新评判时却表现出矛盾态度。我们提出这种重判不一致性可类比人类无意识内隐社会偏见与有意识外显社会偏见的矛盾。针对心理学中常见的性别偏见的实验探究表明,ChatGPT与GPT-4上的重判不一致性具有高度稳定性。这一发现可能暗示,随着大语言模型能力增强,其内部涌现出多样化的认知结构。因此,借助心理学理论,我们能够更深入地理解大语言模型中外显与内隐结构表达的潜在机制。