Transformer language models have received widespread public attention, yet their generated text is often surprising even to NLP researchers. In this survey, we discuss over 250 recent studies of English language model behavior before task-specific fine-tuning. Language models possess basic capabilities in syntax, semantics, pragmatics, world knowledge, and reasoning, but these capabilities are sensitive to specific inputs and surface features. Despite dramatic increases in generated text quality as models scale to hundreds of billions of parameters, the models are still prone to unfactual responses, commonsense errors, memorized text, and social biases. Many of these weaknesses can be framed as over-generalizations or under-generalizations of learned patterns in text. We synthesize recent results to highlight what is currently known about large language model capabilities, thus providing a resource for applied work and for research in adjacent fields that use language models.
翻译:Transformer语言模型已引起公众广泛关注,但其生成的文本即使对自然语言处理研究者而言也时常令人惊讶。本综述探讨了超过250项关于英语语言模型在任务特定微调前行为的研究。语言模型具备句法、语义、语用、世界知识与推理方面的基本能力,但这些能力对特定输入及表层特征高度敏感。尽管随着模型规模扩展至数千亿参数,生成文本质量显著提升,模型仍易出现不实回答、常识性错误、记忆文本复现及社会偏见等问题。这些缺陷中的许多可被归因于对文本习得模式的过度泛化或泛化不足。我们综合近期研究成果,突出当前关于大型语言模型能力的认知,为应用工作及使用语言模型的相邻领域研究提供参考资源。