The number of publications related to the Sustainable Development Goals (SDGs) continues to grow. These publications cover a diverse spectrum of research, from humanities and social sciences to engineering and health. Given the imperative of funding bodies to monitor outcomes and impacts, linking publications to relevant SDGs is critical but remains time-consuming and difficult given the breadth and complexity of the SDGs. A publication may relate to several goals (interconnection feature of goals), and therefore require multidisciplinary knowledge to tag accurately. Machine learning approaches are promising and have proven particularly valuable for tasks such as manual data labeling and text classification. In this study, we employed over 82,000 publications from an Australian university as a case study. We utilized a similarity measure to map these publications onto Sustainable Development Goals (SDGs). Additionally, we leveraged the OpenAI GPT model to conduct the same task, facilitating a comparative analysis between the two approaches. Experimental results show that about 82.89% of the results obtained by the similarity measure overlap (at least one tag) with the outputs of the GPT model. The adopted model (similarity measure) can complement GPT model for SDG classification. Furthermore, deep learning methods, which include the similarity measure used here, are more accessible and trusted for dealing with sensitive data without the use of commercial AI services or the deployment of expensive computing resources to operate large language models. Our study demonstrates how a crafted combination of the two methods can achieve reliable results for mapping research to the SDGs.
翻译:与可持续发展目标(SDGs)相关的出版物数量持续增长。这些出版物涵盖了从人文社会科学到工程与健康等广泛的研究领域。鉴于资助机构监测成果与影响的必要性,将出版物与相关SDGs关联至关重要,但由于SDGs的广泛性与复杂性,这一过程耗时且困难。一篇出版物可能涉及多个目标(目标的互联性特征),因此需要多学科知识才能准确标注。机器学习方法前景广阔,在人工数据标注和文本分类等任务中已证明具有特殊价值。本研究以某澳大利亚大学的82,000余篇出版物为案例,采用相似性度量将这些出版物映射至可持续发展目标。同时,我们利用OpenAI GPT模型执行相同任务,以促进两种方法的对比分析。实验结果显示,相似性度量所得结果中约82.89%与GPT模型的输出存在重叠(至少一个标签)。采用的模型(相似性度量)可补充GPT模型在SDG分类中的不足。此外,包含本研究所用相似性度量的深度学习方法,在处理敏感数据时更易获取且更可信,无需使用商业AI服务或部署昂贵的计算资源运行大型语言模型。本研究展示了如何通过精心结合这两种方法,实现将研究可靠映射至SDGs的效果。