Skill extraction and recommendation systems have been studied from recruiter, applicant, and education perspectives. While AI applications in job advertisements have received broad attention, deficiencies in the instructed skills side remain a challenge. In this work, we address the scarcity of publicly available datasets by releasing both manually annotated and synthetic datasets of skills from the European Skills, Competences, Qualifications and Occupations (ESCO) taxonomy and university course pairs and publishing corresponding annotation guidelines. Specifically, we match graduate-level university courses with skills from the Systems Analysts and Management and Organization Analyst ESCO occupation groups at two granularities: course title with a skill, and course sentence with a skill. We train language models on this dataset to serve as a baseline for retrieval and recommendation systems for course-to-skill and skill-to-course matching. We evaluate the models on a portion of the annotated data. Our BERT model achieves 87% F1-score, showing that course and skill matching is a feasible task.
翻译:技能提取与推荐系统已从招聘者、求职者和教育者视角得到广泛研究。尽管人工智能在招聘广告中的应用备受关注,但教学技能侧的不足仍是挑战。本研究通过发布基于欧洲技能、能力、资格与职业(ESCO)分类体系的大学课程-技能对人工标注与合成数据集,并公开相应标注指南,以解决公开数据集稀缺问题。具体而言,我们在两个粒度上将研究生课程与ESCO分类中系统分析师、管理与组织分析师职业组的技能进行匹配:课程名称与技能匹配、课程语句与技能匹配。基于此数据集训练语言模型,为课程-技能双向匹配的检索与推荐系统提供基线。模型在部分标注数据上进行评估,其中BERT模型达到87%的F1分数,证明课程与技能匹配是可行任务。