In 2015, the United Nations put forward 17 Sustainable Development Goals (SDGs) to be achieved by 2030, where data has been promoted as a focus to innovating sustainable development and as a means to measuring progress towards achieving the SDGs. In this study, we propose a systematic approach towards discovering data types and sources that can be used for SDG research. The proposed method integrates a systematic mapping approach using manual qualitative coding over a corpus of SDG-related research literature followed by an automated process that applies rules to perform data entity extraction computationally. This approach is exemplified by an analysis of literature relating to SDG 7, the results of which are also presented in this paper. The paper concludes with a discussion of the approach and suggests future work to extend the method with more advance NLP and machine learning techniques.
翻译:2015年,联合国提出了17项拟于2030年前实现的可持续发展目标(SDGs),其中数据被视为推动可持续发展创新以及衡量目标实现进展的关键要素。本研究提出了一种系统化方法,用于发现可用于SDG研究的数据类型与来源。该方法将基于人工质性编码的系统映射流程与后续自动化的规则驱动计算实体抽取过程相结合,对SDG相关研究文献语料进行分析。本文以SDG 7相关文献分析为例,展示了该方法的应用效果,并呈现了分析结果。文章最后对此方法进行了讨论,并提出了未来利用更先进的自然语言处理与机器学习技术拓展该方法的改进方向。