In this paper, we address the concept of "alignment" in large language models (LLMs) through the lens of post-structuralist socio-political theory, specifically examining its parallels to empty signifiers. To establish a shared vocabulary around how abstract concepts of alignment are operationalised in empirical datasets, we propose a framework that demarcates: 1) which dimensions of model behaviour are considered important, then 2) how meanings and definitions are ascribed to these dimensions, and by whom. We situate existing empirical literature and provide guidance on deciding which paradigm to follow. Through this framework, we aim to foster a culture of transparency and critical evaluation, aiding the community in navigating the complexities of aligning LLMs with human populations.
翻译:本文通过后结构主义社会政治理论的视角,探讨大语言模型(LLMs)中“对齐”概念,重点分析其与空能指的相似性。为建立围绕抽象对齐概念如何在经验数据集中操作化的共享词汇体系,我们提出一个框架以区分:1)模型行为的哪些维度被视为重要;2)这些维度如何被赋予含义和定义,以及由谁赋予。我们定位现有经验文献,并就选择遵循何种范式提供指引。通过该框架,我们旨在培育透明与批判性评估的文化,助力学界应对LLMs与人类群体对齐的复杂性。