Distributed Ledger Technologies (DLTs) have rapidly evolved, necessitating comprehensive insights into their diverse components. However, a systematic literature review that emphasizes the Environmental, Sustainability, and Governance (ESG) components of DLT remains lacking. To bridge this gap, we selected 107 seed papers to build a citation network of 63,083 references and refined it to a corpus of 24,539 publications for analysis. Then, we labeled the named entities in 46 papers according to twelve top-level categories derived from an established technology taxonomy and enhanced the taxonomy by pinpointing DLT's ESG elements. Leveraging transformer-based language models, we fine-tuned a pre-trained language model for a Named Entity Recognition (NER) task using our labeled dataset. We used our fine-tuned language model to distill the corpus to 505 key papers, facilitating a literature review via named entities and temporal graph analysis on DLT evolution in the context of ESG. Our contributions are a methodology to conduct a machine learning-driven systematic literature review in the DLT field, placing a special emphasis on ESG aspects. Furthermore, we present a first-of-its-kind NER dataset, composed of 54,808 named entities, designed for DLT and ESG-related explorations.
翻译:分布式账本技术(DLT)已迅速发展,迫切需要对其多样化组成部分进行深入洞察。然而,目前仍缺乏系统性地强调DLT的环境、可持续性与治理(ESG)要素的文献综述。为弥补这一空白,我们选取了107篇种子论文,构建了包含63,083条引用的引文网络,并将其精炼为包含24,539篇出版物的语料库用于分析。随后,我们根据已有技术分类法中的十二个顶层类别对46篇论文中的命名实体进行标注,并通过识别DLT的ESG要素对该分类法进行增强。利用基于Transformer的语言模型,我们使用标注数据集对预训练语言模型进行了命名实体识别(NER)任务的微调。借助微调后的语言模型,我们将语料库精炼至505篇关键论文,从而通过命名实体和时间图分析,对ESG背景下的DLT演变进行了文献综述。我们的贡献在于提出了一种方法框架,用于在DLT领域开展机器学习驱动的系统性文献综述,并特别强调了ESG方面。此外,我们首次创建了一个包含54,808个命名实体的NER数据集,专为DLT与ESG相关探索而设计。