In the rapidly evolving field of artificial intelligence (AI), mapping innovation patterns and understanding effective technology transfer from research to applications are essential for economic growth. However, existing data infrastructures suffer from fragmentation, incomplete coverage, and insufficient evaluative capacity. Here, we present DeepInnovationAI, a comprehensive global dataset containing three structured files. DeepPatentAI.csv: Contains 2,356,204 patent records with 8 field-specific attributes. DeepDiveAI.csv: Encompasses 3,511,929 academic publications with 13 metadata fields. These two datasets leverage large language models, multilingual text analysis and dual-layer BERT classifiers to accurately identify AI-related content, while utilizing hypergraph analysis to create robust innovation metrics. Additionally, DeepCosineAI.csv: By applying semantic vector proximity analysis, this file presents approximately one hundred million calculated paper-patent similarity pairs to enhance understanding of how theoretical advancements translate into commercial technologies. DeepInnovationAI enables researchers, policymakers, and industry leaders to anticipate trends and identify collaboration opportunities. With extensive temporal and geographical scope, it supports detailed analysis of technological development patterns and international competition dynamics, establishing a foundation for modeling AI innovation and technology transfer processes.
翻译:在快速演进的人工智能领域,描绘创新模式并理解从研究到应用的有效技术转移对于经济增长至关重要。然而,现有的数据基础设施存在碎片化、覆盖不完整和评估能力不足的问题。本文介绍DeepInnovationAI,这是一个包含三个结构化文件的综合性全球数据集。DeepPatentAI.csv:包含2,356,204条专利记录,涵盖8个领域特定属性。DeepDiveAI.csv:涵盖3,511,929篇学术出版物,包含13个元数据字段。这两个数据集利用大语言模型、多语言文本分析和双层BERT分类器来准确识别AI相关内容,同时利用超图分析创建稳健的创新指标。此外,DeepCosineAI.csv:通过应用语义向量邻近度分析,该文件呈现了约一亿个计算得出的论文-专利相似度对,以增强对理论进步如何转化为商业技术的理解。DeepInnovationAI使研究人员、政策制定者和行业领袖能够预测趋势并识别合作机会。凭借广泛的时间和地理范围,它支持对技术发展模式和国际竞争动态进行详细分析,为建模AI创新和技术转移过程奠定了基础。