Investigators, funders, and the public desire knowledge on topics and trends in publicly funded research but current efforts in manual categorization are limited in scale and understanding. We developed a semi-automated approach to extract and name research topics, and applied this to \$1.9B of NCI funding over 21 years in the radiological sciences to determine micro- and macro-scale research topics and funding trends. Our method relies on sequential clustering of existing biomedical-based word embeddings, naming using subject matter experts, and visualization to discover trends at a macroscopic scale above individual topics. We present results using 15 and 60 cluster topics, where we found that 2D projection of grant embeddings reveals two dominant axes: physics-biology and therapeutic-diagnostic. For our dataset, we found that funding for therapeutics- and physics-based research have outpaced diagnostics- and biology-based research, respectively. We hope these results may (1) give insight to funders on the appropriateness of their funding allocation, (2) assist investigators in contextualizing their work and explore neighboring research domains, and (3) allow the public to review where their tax dollars are being allocated.
翻译:研究者、资助机构及公众渴望了解公共资助研究中的主题与趋势,但当前人工分类工作受限于规模和认知深度。我们开发了一种半自动化方法来提取并命名研究主题,并将其应用于21年间美国国家癌症研究所(NCI)在放射科学领域19亿美元的资助数据,以揭示微观与宏观层面的研究主题及资助趋势。该方法基于对现有生物医学词向量进行序列聚类,通过领域专家命名,并借助可视化技术发现超越单一主题的宏观趋势。我们以15类和60类聚类主题呈现结果,发现资助向量的二维投影揭示了两个主导轴:物理-生物学轴与治疗-诊断轴。在数据集中,治疗类和物理类研究的资助增速分别超过了诊断类和生物类研究。我们期望这些成果能够:(1)为资助机构评估其资金分配的合理性提供洞见;(2)帮助研究者定位自身工作背景并探索相邻研究领域;(3)使公众能够审视其税款的具体去向。