Guidance on how to validate computational text-based measures of social science constructs is fragmented. Whereas scholars are generally acknowledging the importance of validating their text-based measures, they often lack common terminology and a unified framework to do so. This paper introduces a new validation framework called ValiTex, designed to assist scholars to measure social science constructs based on textual data. The framework draws on a long-established tradition within psychometrics while extending the framework for the purpose of computational text analysis. ValiTex consists of two components, a conceptual model, and a dynamic checklist. Whereas the conceptual model provides a general structure along distinct phases on how to approach validation, the dynamic checklist defines specific validation steps and provides guidance on which steps might be considered recommendable (i.e., providing relevant and necessary validation evidence) or optional (i.e., useful for providing additional supporting validation evidence. The utility of the framework is demonstrated by applying it to a use case of detecting sexism from social media data.
翻译:关于如何验证基于计算文本的社会科学构念的指导性建议较为零散。尽管学者们普遍认识到验证基于文本的测量方式的重要性,但他们往往缺乏统一的术语和整合性框架来开展验证工作。本文提出一种名为ValiTex的新型验证框架,旨在帮助学者基于文本数据测量社会科学构念。该框架借鉴心理学测量学的悠久传统,同时针对计算文本分析的目的进行拓展。ValiTex包含两个组成部分:概念模型与动态清单。其中,概念模型提供了按不同阶段处理验证问题的通用结构,而动态清单则明确了具体的验证步骤,并指导哪些步骤应被视为推荐性(即提供相关且必要的验证证据)或选择性(即有助于提供额外的辅助验证证据)。通过将其应用于社交媒体数据中的性别歧视检测案例,展示了该框架的实用性。