We expose a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form "A is B", it will not automatically generalize to the reverse direction "B is A". This is the Reversal Curse. For instance, if a model is trained on "Valentina Tereshkova was the first woman to travel to space", it will not automatically be able to answer the question, "Who was the first woman to travel to space?". Moreover, the likelihood of the correct answer ("Valentina Tershkova") will not be higher than for a random name. Thus, models do not generalize a prevalent pattern in their training set: if "A is B" occurs, "B is A" is more likely to occur. It is worth noting, however, that if "A is B" appears in-context, models can deduce the reverse relationship. We provide evidence for the Reversal Curse by finetuning GPT-3 and Llama-1 on fictitious statements such as "Uriah Hawthorne is the composer of Abyssal Melodies" and showing that they fail to correctly answer "Who composed Abyssal Melodies?". The Reversal Curse is robust across model sizes and model families and is not alleviated by data augmentation. We also evaluate ChatGPT (GPT-3.5 and GPT-4) on questions about real-world celebrities, such as "Who is Tom Cruise's mother? [A: Mary Lee Pfeiffer]" and the reverse "Who is Mary Lee Pfeiffer's son?". GPT-4 correctly answers questions like the former 79% of the time, compared to 33% for the latter. Code available at: https://github.com/lukasberglund/reversal_curse.
翻译:我们揭示了自回归大语言模型(LLMs)在泛化能力上存在一个令人意外的缺陷。若模型以形如“A是B”的句子进行训练,其不会自动泛化至反向关系“B是A”,此即“逆转诅咒”。举例而言,若模型以“瓦莲京娜·捷列什科娃是首位进入太空的女性”为训练数据,它将无法自动回答“谁是首位进入太空的女性?”这一问题,且正确答案(“瓦莲京娜·捷列什科娃”)的出现概率不会高于随机人名。这表明模型未能泛化训练集中的常见模式:若存在“A是B”,则“B是A”应更可能成立。值得注意的是,若“A是B”以语境形式出现,模型仍能推导出反向关联。我们通过微调GPT-3和Llama-1模型提供证据:当输入虚构陈述如“尤里亚·霍桑是《深渊旋律》的作曲家”后,模型无法正确回答“谁创作了《深渊旋律》?”此诅咒在不同模型规模与架构中均具有鲁棒性,且数据增强无法缓解。此外,我们评估了ChatGPT(GPT-3.5与GPT-4)对真实名人问题的回答能力,例如“汤姆·克鲁斯的母亲是谁?(答案:玛丽·李·法伊弗)”及其反向问题“玛丽·李·法伊弗的儿子是谁?”。GPT-4对前者的正确率为79%,而对后者仅为33%。代码见:https://github.com/lukasberglund/reversal_curse