Modern language modeling tasks are often underspecified: for a given token prediction, many words may satisfy the user's intent of producing natural language at inference time, however only one word will minimize the task's loss function at training time. We introduce a simple causal mechanism to describe the role underspecification plays in the generation of spurious correlations. Despite its simplicity, our causal model directly informs the development of two lightweight black-box evaluation methods, that we apply to gendered pronoun resolution tasks on a wide range of LLMs to 1) aid in the detection of inference-time task underspecification by exploiting 2) previously unreported gender vs. time and gender vs. location spurious correlations on LLMs with a range of A) sizes: from BERT-base to GPT-4 Turbo Preview, B) pre-training objectives: from masked & autoregressive language modeling to a mixture of these objectives, and C) training stages: from pre-training only to reinforcement learning from human feedback (RLHF). Code and open-source demos available at https://github.com/2dot71mily/uspec.
翻译:现代语言建模任务通常存在欠指定性问题:对于给定的词元预测,推理时许多词语可能满足用户生成自然语言的意图,但训练时仅有一个词能使任务损失函数最小化。我们引入了一种简单的因果机制来描述欠指定性在虚假相关性产生中的作用。尽管该因果模型简洁,但它直接指导了两种轻量级黑盒评估方法的开发。我们将这些方法应用于广泛的大型语言模型(LLMs)中的性别代词消解任务,以:1) 通过利用 2) 先前未被报道的LLMs中性别与时间、性别与位置的虚假相关性,辅助检测推理时的任务欠指定性。这些LLMs涵盖多个维度:A) 规模:从BERT-base到GPT-4 Turbo Preview;B) 预训练目标:从掩码语言建模与自回归语言建模到这些目标的混合;C) 训练阶段:从仅预训练到基于人类反馈的强化学习(RLHF)。代码和开源演示可在 https://github.com/2dot71mily/uspec 获取。