In this paper, we propose a method for resume rating using Latent Dirichlet Allocation (LDA) and entity detection with SpaCy. The proposed method first extracts relevant entities such as education, experience, and skills from the resume using SpaCy's Named Entity Recognition (NER). The LDA model then uses these entities to rate the resume by assigning topic probabilities to each entity. Furthermore, we conduct a detailed analysis of the entity detection using SpaCy's NER and report its evaluation metrics. Using LDA, our proposed system breaks down resumes into latent topics and extracts meaningful semantic representations. With a vision to define our resume score to be more content-driven rather than a structure and keyword match driven, our model has achieved 77% accuracy with respect to only skills in consideration and an overall 82% accuracy with all attributes in consideration. (like college name, work experience, degree and skills)
翻译:本文提出一种结合潜在狄利克雷分配(LDA)与SpaCy实体检测的简历评级方法。该方法首先利用SpaCy的命名实体识别(NER)从简历中提取教育背景、工作经验、技能等相关实体,继而通过LDA模型为各实体分配主题概率以实现简历评分。此外,我们针对SpaCy NER的实体检测性能进行了详细分析,并报告其评估指标。通过LDA,所提系统将简历分解为潜在主题,提取出具有意义的语义表征。在追求简历评分内容驱动而非结构关键词匹配的构想下,我们的模型在仅考虑技能属性时达到77%的准确率,而在综合考量所有属性(如院校名称、工作经验、学位及技能)时整体准确率达82%。