Guidance on how to validate computational text-based measures of social science constructs is fragmented. Although scholars generally acknowledge the importance of validating their text-based measures, they often lack common terminology and a unified framework to do so. This paper introduces ValiTex, a new validation framework designed to assist scholars in validly measuring social science constructs based on textual data. The framework draws on a long-established validity concept in psychometrics but extends these concepts to cover the specific needs of computational text analysis. ValiTex consists of two components, a conceptual framework and a dynamic checklist. Whereas the conceptual framework provides a general structure along distinct phases on how to approach validation, the dynamic checklist defines specific validation steps and provides guidance on which steps might be considered recommendable (i.e., providing relevant and necessary validation evidence) or optional (i.e., useful for providing additional supporting validation evidence). We demonstrate the utility of the framework by applying it to a use case of detecting sexism from social media data
翻译:关于如何验证社会科学构念中基于计算文本测量的指导目前较为零散。尽管学者们普遍认可文本测量验证的重要性,但往往缺乏统一的术语和框架来开展验证工作。本文提出ValiTex这一新型验证框架,旨在帮助研究者基于文本数据有效测量社会科学构念。该框架借鉴心理测量学中成熟的效度概念,并进一步扩展以覆盖计算文本分析的特殊需求。ValiTex由两个部分组成:概念框架和动态清单。概念框架沿着不同阶段提供验证的总体结构,而动态清单则定义具体的验证步骤,并指导哪些步骤可能被推荐(即提供相关且必要的验证证据)或可选(即有助于提供额外的辅助验证证据)。我们通过从社交媒体数据中检测性别歧视这一用例,展示了该框架的实用性。