Recent years have brought significant advances to Natural Language Processing (NLP), which enabled fast progress in the field of computational job market analysis. Core tasks in this application domain are skill extraction and classification from job postings. Because of its quick growth and its interdisciplinary nature, there is no exhaustive assessment of this emerging field. This survey aims to fill this gap by providing a comprehensive overview of deep learning methodologies, datasets, and terminologies specific to NLP-driven skill extraction and classification. Our comprehensive cataloging of publicly available datasets addresses the lack of consolidated information on dataset creation and characteristics. Finally, the focus on terminology addresses the current lack of consistent definitions for important concepts, such as hard and soft skills, and terms relating to skill extraction and classification.
翻译:近年来,自然语言处理(NLP)领域的重大突破推动了计算化就业市场分析的快速发展。该应用领域的核心任务是从职位招聘信息中提取并分类技能。由于该领域发展迅速且具有跨学科特性,目前尚缺乏对其的全面评估。本综述旨在填补这一空白,系统梳理针对NLP驱动的技能提取与分类的深度学习方法、数据集及术语体系。通过对公开数据集的全面编目,本研究解决了数据集创建与特征信息缺乏整合的问题。最后,针对技能提取与分类相关概念(如硬技能与软技能等关键术语)缺乏统一定义的现状,本研究聚焦术语体系构建以弥合这一不足。