In several branches of the social sciences and humanities, surveys based on standardized questionnaires are a prominent research tool. While there are a variety of ways to analyze the data, some standard procedures have become established. When those surveys want to analyze differences in the answer patterns of different groups (e.g., countries, gender, age, ...), these procedures can only be carried out in a meaningful way if there is measurement invariance, i.e., the measured construct has psychometric equivalence across groups. As recently raised as an open problem by Sauerwein et al. (2021), new evaluation methods that work in the absence of measurement invariance are needed. This paper promotes an unsupervised learning-based approach to such research data by proposing a procedure that works in three phases: data preparation, clustering of questionnaires, and measuring similarity based on the obtained clustering and the properties of each group. We generate synthetic data in three data sets, which allows us to compare our approach with the PCA approach under measurement invariance and under violated measurement invariance. As a main result, we obtain that the approach provides a natural comparison between groups and a natural description of the response patterns of the groups. Moreover, it can be safely applied to a wide variety of data sets, even in the absence of measurement invariance. Finally, this approach allows us to translate (violations of) measurement invariance into a meaningful measure of similarity.
翻译:在社会科学和人文学科的多个分支中,基于标准化问卷的调查是一种突出的研究工具。尽管有多种分析数据的方法,但一些标准程序已成为惯例。当这些调查希望分析不同群体(例如国家、性别、年龄等)答案模式的差异时,只有在存在测量不变性——即所测构念跨群体具有心理测量等价性——的情况下,这些程序才能有意义地进行。正如Sauerwein等人(2021)最近提出的一个开放性问题,我们需要在缺乏测量不变性时也能有效工作的新评估方法。本文提倡一种基于无监督学习的方法来处理此类研究数据,提出一个分三阶段运行的流程:数据准备、问卷聚类,以及基于所得聚类和每个群体属性测量相似性。我们在三个数据集中生成合成数据,这使我们能够在测量不变性成立和违反测量不变性的情况下,将我们的方法与PCA方法进行比较。主要结果是,该方法提供了群体间的自然比较以及群体反应模式的自然描述。此外,它可以安全地应用于广泛的数据集,即使在缺乏测量不变性的情况下也是如此。最后,该方法使我们能够将(违反)测量不变性转化为有意义的相似性度量。