Named entity recognition (NER) is used to extract information from various documents and texts such as names and dates. It is important to extract education and work experience information from resumes in order to filter them. Considering the fact that all information in a resume has to be entered to the companys system manually, automatizing this process will save time of the companies. In this study, a deep learning-based semi-automatic named entity recognition system has been implemented with a focus on resumes in the field of IT. Firstly, resumes of employees from five different IT related fields has been annotated. Six transformer based pre-trained models have been adapted to named entity recognition problem using the annotated data. These models have been selected among popular models in the natural language processing field. The obtained system can recognize eight different entity types which are city, date, degree, diploma major, job title, language, country and skill. Models used in the experiments are compared using micro, macro and weighted F1 scores and the performance of the methods was evaluated. Taking these scores into account for test set the best micro and weighted F1 score is obtained by RoBERTa and the best macro F1 score is obtained by Electra model.
翻译:命名实体识别(NER)用于从各种文档和文本中提取信息,例如姓名和日期。为了筛选简历,从中提取教育背景和工作经历信息至关重要。考虑到简历中的所有信息都需要人工录入公司系统,实现该流程的自动化将为公司节省时间。本研究实现了一种基于深度学习的半自动命名实体识别系统,重点针对信息技术领域的简历。首先,对来自五个不同信息技术相关领域的员工简历进行了标注。随后,利用标注数据将六种基于Transformer的预训练模型适配到命名实体识别任务中。这些模型选自自然语言处理领域的流行模型。所获得的系统能够识别八种不同的实体类型:城市、日期、学位、专业、职位、语言、国家和技能。实验中使用的模型通过微平均、宏平均和加权F1分数进行比较,并对各方法的性能进行了评估。考虑到测试集的这些分数,最佳微平均和加权F1分数由RoBERTa模型获得,最佳宏平均F1分数由Electra模型获得。