Generative modelling with Transformer architectures can simulate complex sequential structures across various applications. We extend this line of work to the social sciences by introducing a Transformer-based generative model tailored to longitudinal socio-economic data. Our contributions are: (i) we design a novel encoding method that represents socio-economic life histories as sequences, including overlapping events across life domains; and (ii) we adapt generative modelling techniques to simulate plausible alternative life trajectories conditioned on past histories. Using large-scale data from the Italian social security administration (INPS), we show that the model can be trained at scale, reproduces realistic labour market patterns consistent with known causal relationships, and generates coherent hypothetical life paths. This work demonstrates the feasibility of generative modelling for socio-economic trajectories and opens new opportunities for policy-oriented research, with counterfactual generation as a particularly promising application.
翻译:Transformer架构的生成建模能够模拟跨多个应用的复杂序列结构。我们将这一研究方向扩展至社会科学领域,提出了一种基于Transformer的生成模型,专门针对纵向社会经济数据。我们的贡献包括:(i)设计了一种新颖的编码方法,将社会经济生命历程表示为包含跨生活领域重叠事件的序列;(ii)调整生成建模技术,以模拟基于过往历史的合理替代生命轨迹。利用意大利社会保障局(INPS)的大规模数据,我们证明该模型可进行大规模训练,能复现与已知因果关系一致的现实劳动力市场模式,并生成连贯的假设生命路径。这项工作证明了社会经济轨迹生成建模的可行性,为政策导向研究开辟了新机遇,其中反事实生成作为特别具有前景的应用方向。