The increasing reliance on online recruitment platforms coupled with the adoption of AI technologies has highlighted the critical need for efficient resume classification methods. However, challenges such as small datasets, lack of standardized resume templates, and privacy concerns hinder the accuracy and effectiveness of existing classification models. In this work, we address these challenges by presenting a comprehensive approach to resume classification. We curated a large-scale dataset of 13,389 resumes from diverse sources and employed Large Language Models (LLMs) such as BERT and Gemma1.1 2B for classification. Our results demonstrate significant improvements over traditional machine learning approaches, with our best model achieving a top-1 accuracy of 92\% and a top-5 accuracy of 97.5\%. These findings underscore the importance of dataset quality and advanced model architectures in enhancing the accuracy and robustness of resume classification systems, thus advancing the field of online recruitment practices.
翻译:随着在线招聘平台的日益普及和人工智能技术的广泛应用,对高效简历分类方法的需求变得尤为迫切。然而,现有分类模型在准确性及有效性方面仍面临诸多挑战,例如数据集规模小、缺乏标准化的简历模板以及隐私问题等。本研究针对这些挑战,提出了一种全面的简历分类方法。我们从多种来源收集并整理了一个包含13,389份简历的大规模数据集,并采用BERT、Gemma1.1 2B等大语言模型进行分类。实验结果表明,相较于传统机器学习方法,我们的方法取得了显著提升,其中最优模型的Top-1准确率达到92%,Top-5准确率达到97.5%。这些发现强调了数据集质量与先进模型架构对于提升简历分类系统准确性与鲁棒性的重要性,从而推动了在线招聘实践领域的发展。