Funding acknowledgments in scholarly publications provide large-scale trace data on organizations that support scientific research. We present a dataset for linking global science funding organizations to research publications by systematically disambiguating unique funding acknowledgment strings extracted from publication metadata. Funder names are matched to standardized organizational identifiers using a multi-stage pipeline that combines lexical normalization, similarity-based clustering, rule-based matching, named entity recognition assistance, and manual validation. The resulting dataset links 1.9 million unique funder strings to canonical organization identifiers and records match types and unresolved cases to support transparency. Technical validation includes paper-level comparisons across bibliometric sources and manual verification against full-text acknowledgment sections, with reported recall and precision metrics. This dataset supports analyses of funding flows, institutional funding portfolios, regional representation, and concentration patterns in the global research system.
翻译:学术出版物中的资金致谢部分提供了支持科学研究机构的大规模追踪数据。我们提出了一种通过系统消歧从出版元数据中提取的唯一资金致谢字符串,将全球科学资助组织与研究出版物进行关联的方法。资助机构名称通过多阶段流水线匹配到标准化组织标识符,该流水线结合了词汇归一化、基于相似度的聚类、规则匹配、命名实体识别辅助及人工验证。最终数据集将190万个唯一资助字符串与规范组织标识符建立关联,并记录了匹配类型及未解决案例以支持透明度。技术验证包括跨文献计量源的论文级比较及对全文致谢章节的人工核查,并报告了召回率与精确度指标。该数据集支持对全球科研体系中的资金流向、机构资助组合、区域代表性及集中度模式进行分析。