Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias

Citation practices are crucial in shaping the structure of scientific knowledge, yet they are often influenced by contemporary norms and biases. The emergence of Large Language Models (LLMs) like GPT-4 introduces a new dynamic to these practices. Interestingly, the characteristics and potential biases of references recommended by LLMs that entirely rely on their parametric knowledge, and not on search or retrieval-augmented generation, remain unexplored. Here, we analyze these characteristics in an experiment using a dataset of 166 papers from AAAI, NeurIPS, ICML, and ICLR, published after GPT-4's knowledge cut-off date, encompassing 3,066 references in total. In our experiment, GPT-4 was tasked with suggesting scholarly references for the anonymized in-text citations within these papers. Our findings reveal a remarkable similarity between human and LLM citation patterns, but with a more pronounced high citation bias in GPT-4, which persists even after controlling for publication year, title length, number of authors, and venue. Additionally, we observe a large consistency between the characteristics of GPT-4's existing and non-existent generated references, indicating the model's internalization of citation patterns. By analyzing citation graphs, we show that the references recommended by GPT-4 are embedded in the relevant citation context, suggesting an even deeper conceptual internalization of the citation networks. While LLMs can aid in citation generation, they may also amplify existing biases and introduce new ones, potentially skewing scientific knowledge dissemination. Our results underscore the need for identifying the model's biases and for developing balanced methods to interact with LLMs in general.

翻译：引用实践在塑造科学知识结构方面至关重要，然而它们常常受到当代规范与偏见的影响。以GPT-4为代表的大型语言模型（LLMs）的出现为这些实践引入了新的动态。值得注意的是，完全依赖参数化知识（而非搜索或检索增强生成）的LLMs所推荐参考文献的特征及潜在偏见尚未得到探索。本文通过一项实验分析了这些特征，实验数据集包含166篇发表于GPT-4知识截止日期之后的AAAI、NeurIPS、ICML和ICLR会议论文，共计涉及3,066条参考文献。实验中，GPT-4的任务是为这些论文中经过匿名处理的文内引用推荐学术参考文献。我们的研究结果显示，人类与LLM的引用模式具有显著相似性，但GPT-4表现出更明显的高被引偏见，即使在控制发表年份、标题长度、作者数量和发表场所后，该偏见依然存在。此外，我们观察到GPT-4生成的已存在参考文献与虚构参考文献在特征上高度一致，表明模型已内化了引用模式。通过分析引用图谱，我们发现GPT-4推荐的参考文献能够嵌入相关引用语境，这暗示模型对引用网络实现了更深层次的概念内化。虽然LLMs能够辅助生成引用，但它们也可能放大现有偏见并引入新的偏见，从而可能扭曲科学知识的传播。我们的研究结果强调，需要识别模型偏见并开发与LLMs交互的平衡方法。