It is desirable to coarsely classify short scientific texts, such as grant or publication abstracts, for strategic insight or research portfolio management. These texts efficiently transmit dense information to experts possessing a rich body of knowledge to aid interpretation. Yet this task is remarkably difficult to automate because of brevity and the absence of context. To address this gap, we have developed a novel approach to generate and appropriately assign coarse domain-specific labels. We show that a Large Language Model (LLM) can provide metadata essential to the task, in a process akin to the augmentation of supplemental knowledge representing human intuition, and propose a workflow. As a pilot study, we use a corpus of award abstracts from the National Aeronautics and Space Administration (NASA). We develop new assessment tools in concert with established performance metrics.
翻译:对短篇科学文本(如资助申请摘要或发表论文摘要)进行粗粒度分类,有助于获取战略洞察或管理研究组合。这类文本能向具备丰富知识背景的专家高效传递密集信息以辅助解读。然而,由于文本的简洁性和语境的缺失,该任务的自动化实现异常困难。为弥补这一空白,我们开发了一种创新方法,用于生成并合理分配粗粒度的领域特定标签。研究表明,大型语言模型能够提供该任务所需的关键元数据,其过程类似于通过补充知识增强人类直觉,我们据此提出相应工作流程。作为试点研究,我们采用美国国家航空航天局的获奖摘要语料库进行验证,并结合既有性能指标开发了新的评估工具。