The rapid expansion of scholarly publications across diverse disciplines has made it increasingly difficult to systematically evaluate how research contributes to the United Nations Sustainable Development Goals (SDGs). Domain classification of research articles done manually through research experts is extremely impractical because of the number of publications, expensive in time and may not be consistent when done by human beings. This paper proposes an automated and rule-based computational model of classifying research papers based on SDGs with expert curated Boolean query mappings to overcome these challenges. The proposed system has a web-based interface to input data and display results, a backend application programming interface to do high throughput processing, and a Python-based classification engine which uses structured Boolean expressions to process bibliographic metadata (titles, abstracts, and keywords). The framework can be used to support single-paper-based classification and batch-based classification as well as offer clear and understandable outputs that clearly show what query parts motivated each SDG assignment. The experimental testing on massive bibliographic data sets has shown that the system can process thousands of research records in an hour with reproducible and consistent results. The proposed approach provides a viable solution to institutions, researchers and policymakers who are interested in analysis of research alignment with the goal of sustainability in a systematic fashion that would not involve the use of machine learning models whose inputs and outputs are not easily understandable.
翻译:随着学术出版物在各学科领域的迅速扩张,系统评估研究如何促进联合国可持续发展目标(SDGs)变得越来越困难。通过研究专家手动对研究文章进行领域分类,由于出版物数量庞大、耗时昂贵且人工操作可能缺乏一致性,在实践中极不可行。本文提出了一种基于规则的计算模型,利用专家构建的布尔查询映射,实现面向SDGs的研究论文自动分类,以应对上述挑战。该系统包含用于数据输入和结果展示的Web界面、支持高通量处理的后端应用程序接口,以及一个基于Python的分类引擎,该引擎使用结构化布尔表达式处理文献元数据(标题、摘要和关键词)。该框架既支持单篇论文分类,也支持批量分类,并能提供清晰易懂的输出结果,明确展示每个SDG分配所依据的查询条件。在大规模文献数据集上的实验测试表明,该系统可在一小时内处理数千条研究记录,且结果具有可重复性和一致性。所提出的方法为机构、研究者和政策制定者提供了一种可行的解决方案,使其能够以系统化的方式分析研究与可持续发展目标的契合度,而无需使用输入输出难以理解的机器学习模型。