AI systems based on artificial neural networks are being developed with aspirations of pushing the boundary of human mathematical knowledge. A key question for these systems is how much they can reach beyond their training data. Mathematical discovery requires a strong form of out of distribution generalization; the ability to hypothesize genuinely new - and potentially logically more powerful - mathematical structures. It has been hypothesized that language abilities support such generalizations in human cognition. In this work, we use simple arithmetic as a case study for examining how modern AI models could expand their mathematical horizons, evaluating whether these models can independently discover the concept of "zero". We show that We show that (1) language models of a GPT-2 size are unable to perform this generalization at test time regardless of language pretraining, but (2) models can improve substantially after training on tens or hundreds of examples of zero. Additionally, we find that language pretraining reduces the number of required examples by approximately $50\%$, showing that language abilities can scaffold mathematical discovery in neural models.
翻译:基于人工神经网络的AI系统正朝着突破人类数学知识边界的目标发展。这些系统的核心问题在于其超越训练数据的程度。数学发现需要强大的分布外泛化能力——即能够提出真正新颖的、在逻辑上可能更具威力的数学结构。已有假说认为语言能力支撑了人类认知中的这类泛化能力。本研究以简单算术为案例,考察现代AI模型如何拓展数学视野,评估这些模型能否独立发现"零"的概念。研究表明:(1) GPT-2规模的语言模型无论是否经过语言预训练,在测试阶段均无法实现该泛化;(2) 但在经过数十至数百个零示例训练后,模型表现显著提升。此外,我们发现语言预训练可将所需示例数量减少约50%,这表明语言能力能够支撑神经模型的数学发现。