Identifying critical research within the growing body of academic work is an essential element of quality research. Systematic review processes, used in evidence-based medicine, formalise this as a procedure that must be followed in a research program. However, it comes with an increasing burden in terms of the time required to identify the important articles of research for a given topic. In this work, we develop a method for building a general-purpose filtering system that matches a research question, posed as a natural language description of the required content, against a candidate set of articles obtained via the application of broad search terms. Our results demonstrate that transformer models, pre-trained on biomedical literature then fine tuned for the specific task, offer a promising solution to this problem. The model can remove large volumes of irrelevant articles for most research questions.
翻译:在日益增长的学术成果中识别关键研究是高质量研究的基本要素。循证医学中采用的系统综述流程将这一过程形式化为研究项目中必须遵循的程序。然而,随着需要筛选的文献量增加,针对特定主题识别重要研究文章所需的时间负担也日益加重。本研究开发了一种通用筛选系统的构建方法,该系统能够将自然语言描述的研究问题与通过宽泛检索词获取的候选文献集进行匹配。实验结果表明,在生物医学文献上预训练、并针对该特定任务微调的Transformer模型为此问题提供了有前景的解决方案。该模型能够为多数研究问题滤除大量无关文献。