Determining Research Priorities Using Machine Learning

We summarize our exploratory investigation into whether Machine Learning (ML) techniques applied to publicly available professional text can substantially augment strategic planning for astronomy. We find that an approach based on Latent Dirichlet Allocation (LDA) using content drawn from astronomy journal papers can be used to infer high-priority research areas. While the LDA models are challenging to interpret, we find that they may be strongly associated with meaningful keywords and scientific papers which allow for human interpretation of the topic models. Significant correlation is found between the results of applying these models to the previous decade of astronomical research ("1998-2010" corpus) and the contents of the science frontier panel report which contains high-priority research areas identified by the 2010 National Academies' Astronomy and Astrophysics Decadal Survey ("DS2010" corpus). Significant correlations also exist between model results of the 1998-2010 corpus and the submitted whitepapers to the Decadal Survey ("whitepapers" corpus). Importantly, we derive predictive metrics based on these results which can provide leading indicators of which content modeled by the topic models will become highly cited in the future. Using these identified metrics and the associations between papers and topic models it is possible to identify important papers for planners to consider. A preliminary version of our work was presented by Thronson etal. 2021 and Thomas etal. 2022.

翻译：我们总结了探索性研究，旨在探究应用于公开专业文本的机器学习（ML）技术是否能够显著增强天文学领域的战略规划。研究发现，基于潜在狄利克雷分配（LDA）的方法，利用天文学期刊论文内容，能够推断出高优先级的研究领域。尽管LDA模型的可解释性存在挑战，但我们发现这些模型可能与具有意义的关键词和科学论文存在强关联，从而允许对主题模型进行人工解读。将此类模型应用于过去十年（"1998-2010"语料库）的天文学研究所得结果，与包含2010年美国国家科学院天文学与天体物理学十年调查（"DS2010"语料库）所确定的高优先级研究领域的科学前沿小组报告内容之间存在显著相关性。1998-2010语料库的模型结果与提交至十年调查的白皮书（"白皮书"语料库）之间也存在显著相关性。重要的是，我们基于这些结果推导出预测性指标，这些指标能够为主题模型所建模的内容提供未来是否将被高频引用的先行指标。利用这些已识别的指标以及论文与主题模型之间的关联，规划者有可能筛选出需要重点关注的重要论文。本研究的初步版本已由Thronson等人（2021年）和Thomas等人（2022年）发表。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日