The performance of medical research can be viewed and evaluated not only from the perspective of publication output, but also from the perspective of economic exploitability. Patents can represent the exploitation of research results and thus the transfer of knowledge from research to industry. In this study, we set out to identify publication-patent pairs in order to use patents as a proxy for the economic impact of research. To identify these pairs, we matched scholarly publications and patents by comparing the names of authors and investors. To resolve the ambiguities that arise in this name-matching process, we expanded our approach with two additional filter features, one used to assess the similarity of text content, the other to identify common references in the two document types. To evaluate text similarity, we extracted and transformed technical terms from a medical ontology (MeSH) into numerical vectors using word embeddings. We then calculated the results of the two supporting features over an example five-year period. Furthermore, we developed a statistical procedure which can be used to determine valid patent classes for the domain of medicine. Our complete data processing pipeline is freely available, from the raw data of the two document types right through to the validated publication-patent pairs.
翻译:医学研究的绩效不仅可以从出版物产出的角度审视与评估,亦能从经济可利用性的视角进行考量。专利可表征研究成果的产业化应用,从而反映知识从研究向产业的转移。本研究旨在识别出版物-专利配对,以专利作为衡量研究经济影响的代理指标。为识别这些配对,我们通过比对作者与发明人姓名的方式对学术出版物与专利进行匹配。为解决姓名匹配过程中产生的歧义问题,我们扩展了研究方法,引入两项辅助过滤特征:一项用于评估文本内容的相似性,另一项用于识别两类文献中的共同参考文献。为评估文本相似性,我们从医学本体(MeSH)中提取技术术语,并利用词嵌入将其转化为数值向量。随后,我们以五年为示例周期计算了两项辅助特征的结果。此外,我们开发了一种统计程序,可用于确定医学领域的有效专利分类。我们完整的数据处理流程已公开提供,涵盖从两类文献的原始数据直至已验证的出版物-专利配对的全过程。