Automated occupation extraction and standardization from free-text job postings and resumes are crucial for applications like job recommendation and labor market policy formation. This paper introduces LLM4Jobs, a novel unsupervised methodology that taps into the capabilities of large language models (LLMs) for occupation coding. LLM4Jobs uniquely harnesses both the natural language understanding and generation capacities of LLMs. Evaluated on rigorous experimentation on synthetic and real-world datasets, we demonstrate that LLM4Jobs consistently surpasses unsupervised state-of-the-art benchmarks, demonstrating its versatility across diverse datasets and granularities. As a side result of our work, we present both synthetic and real-world datasets, which may be instrumental for subsequent research in this domain. Overall, this investigation highlights the promise of contemporary LLMs for the intricate task of occupation extraction and standardization, laying the foundation for a robust and adaptable framework relevant to both research and industrial contexts.
翻译:从自由文本形式的职位招聘信息与简历中自动提取职业信息并将其标准化,对于工作推荐及劳动力市场政策制定等应用至关重要。本文提出LLM4Jobs,一种利用大语言模型能力实现职业编码的新型无监督方法。LLM4Jobs独特地结合了大语言模型的自然语言理解与生成能力。通过在合成数据集和真实数据集上的严格实验评估,我们证明LLM4Jobs持续超越无监督领域的现有最优基准,展现出其在不同数据集和粒度下的广泛适用性。作为本研究的附带成果,我们提供了合成数据集和真实数据集,这些资源可能对本领域的后续研究具有重要价值。总体而言,本研究凸显了当代大语言模型在职业提取与标准化这一复杂任务中的潜力,为构建适用于研究与工业场景的稳健且灵活的框架奠定了基础。