Job-SDF: A Multi-Granularity Dataset for Job Skill Demand Forecasting and Benchmarking

In a rapidly evolving job market, skill demand forecasting is crucial as it enables policymakers and businesses to anticipate and adapt to changes, ensuring that workforce skills align with market needs, thereby enhancing productivity and competitiveness. Additionally, by identifying emerging skill requirements, it directs individuals towards relevant training and education opportunities, promoting continuous self-learning and development. However, the absence of comprehensive datasets presents a significant challenge, impeding research and the advancement of this field. To bridge this gap, we present Job-SDF, a dataset designed to train and benchmark job-skill demand forecasting models. Based on 10.35 million public job advertisements collected from major online recruitment platforms in China between 2021 and 2023, this dataset encompasses monthly recruitment demand for 2,324 types of skills across 521 companies. Our dataset uniquely enables evaluating skill demand forecasting models at various granularities, including occupation, company, and regional levels. We benchmark a range of models on this dataset, evaluating their performance in standard scenarios, in predictions focused on lower value ranges, and in the presence of structural breaks, providing new insights for further research. Our code and dataset are publicly accessible via the https://github.com/Job-SDF/benchmark.

翻译：在快速变化的就业市场中，技能需求预测至关重要，它使政策制定者和企业能够预见并适应变化，确保劳动力技能与市场需求保持一致，从而提高生产力和竞争力。此外，通过识别新兴技能需求，它可以引导个人寻求相关的培训和教育机会，促进持续的自我学习与发展。然而，综合性数据集的缺失构成了重大挑战，阻碍了该领域的研究与发展。为弥补这一空白，我们提出了Job-SDF，这是一个专为训练和评估职位技能需求预测模型而设计的数据集。该数据集基于2021年至2023年间从中国主要在线招聘平台收集的1035万条公开招聘广告，涵盖了521家公司对2324种技能的月度招聘需求。我们的数据集独特地支持在多个粒度上评估技能需求预测模型，包括职业、公司和地区层面。我们在此数据集上对一系列模型进行了基准测试，评估了它们在标准场景、侧重于较低值范围的预测以及存在结构性断点的情况下的性能，为进一步研究提供了新的见解。我们的代码和数据集可通过 https://github.com/Job-SDF/benchmark 公开访问。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日