CAREER: A Foundation Model for Labor Sequence Data

Labor economists regularly analyze employment data by fitting predictive models to small, carefully constructed longitudinal survey datasets. Although machine learning methods offer promise for such problems, these survey datasets are too small to take advantage of them. In recent years large datasets of online resumes have also become available, providing data about the career trajectories of millions of individuals. However, standard econometric models cannot take advantage of their scale or incorporate them into the analysis of survey data. To this end we develop CAREER, a foundation model for job sequences. CAREER is first fit to large, passively-collected resume data and then fine-tuned to smaller, better-curated datasets for economic inferences. We fit CAREER to a dataset of 24 million job sequences from resumes, and adjust it on small longitudinal survey datasets. We find that CAREER forms accurate predictions of job sequences, outperforming econometric baselines on three widely-used economics datasets. We further find that CAREER can be used to form good predictions of other downstream variables. For example, incorporating CAREER into a wage model provides better predictions than the econometric models currently in use.

翻译：摘要：劳动经济学家通常通过将预测模型拟合到小型、精心构建的纵向调查数据集来分析就业数据。尽管机器学习方法有望解决此类问题，但这些调查数据集规模过小，难以充分利用这些方法。近年来，大规模在线简历数据集也变得可用，提供了数百万个体职业轨迹的数据。然而，标准计量经济模型无法利用其规模优势，也无法将其纳入调查数据的分析中。为此，我们开发了CAREER——一个针对工作序列的基础模型。CAREER首先在大规模被动收集的简历数据上进行预训练，然后针对较小但经过更精细整理的数据集进行微调，以用于经济推断。我们将CAREER拟合到一个包含2400万个简历工作序列的数据集，并在小型纵向调查数据集上进行了调整。结果显示，CAREER能够准确预测工作序列，在三个广泛使用的经济学数据集上优于计量经济学基线模型。此外，我们发现CAREER可用于对其他下游变量形成良好的预测。例如，将CAREER纳入工资模型后，其预测效果优于当前使用的计量经济学模型。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日