Identity is one of the most commonly studied constructs in social science. However, despite extensive theoretical work on identity, there remains a need for additional empirical data to validate and refine existing theories. This paper introduces a novel approach to studying identity by enhancing word embeddings with socio-demographic information. As a proof of concept, we demonstrate that our approach successfully reproduces and extends established findings regarding gendered self-views. Our methodology can be applied in a wide variety of settings, allowing researchers to tap into a vast pool of naturally occurring data, such as social media posts. Unlike similar methods already introduced in computer science, our approach allows for the study of differences between social groups. This could be particularly appealing to social scientists and may encourage the faster adoption of computational methods in the field.
翻译:身份认同是社会科学中最常研究的构念之一。然而,尽管已有大量关于身份认同的理论研究,但仍需更多实证数据来验证和完善现有理论。本文提出了一种通过融入社会人口统计学信息增强词嵌入来研究身份认同的新方法。作为概念验证,我们证明该方法成功复现并拓展了关于性别化自我认知的既有发现。我们的方法可应用于多种场景,使研究者能够利用海量自然产生的数据(如社交媒体帖子)。与计算机科学领域已有的类似方法不同,我们的方法能够研究社会群体间的差异。这对社会科学家可能特别具有吸引力,并有望推动该领域更快地采用计算方法。