Recent reports claim that Large Language Models (LLMs) derive new science and exhibit human-level general intelligence. Such claims are entangled with two different narratives about what LLMs do: one in which they are an engine of synthesis that genuinely reasons to new knowledge, and one in which they retrieve and re-emit the work of others without attribution. In the scientific setting these are best understood as a contrast between \emph{reasoning} and \emph{plagiarism}. Finding where the truth lies between these two narratives is very challenging, as central components of the model -- the training data and the interaction transcript -- remain opaque. Thus claims of LLM reasoning do not satisfy Popper's refutability principle. We propose guidelines for transparency and reproducibility that will allow reasoning claims to be studied using the scientific method. The dominance of the reasoning narrative, we suggest, is in practice encouraging plagiarism in the scientific literature; we discuss what might be done about it.
翻译:近期有报告宣称大语言模型(LLMs)能够产生新科学成果并展现出人类水平的通用智能。此类宣称与两种关于LLM运作方式的不同叙事相互交织:一种认为LLM是能够真正推理出新知识的综合引擎,另一种则认为它们是在不加引述地检索并重新生成他人的成果。在科学语境下,这两种叙事最好理解为“推理”与“剽窃”之间的对立。由于模型的核心组成部分——训练数据和交互记录——仍不透明,要在这两种叙事之间探寻真相极具挑战性。因此,关于LLM推理的宣称并不符合波普尔的可证伪性原则。我们提出了旨在促进透明度和可重复性的指导方针,使得推理宣称能够通过科学方法加以研究。我们认为,推理叙事的盛行实际上正在鼓励科学文献中的剽窃行为;我们讨论了应对此问题的可能措施。