Guidance on how to validate computational text-based measures of social science constructs is fragmented. While scholars generally acknowledge the importance of validating their text-based measures, they often lack common terminology and a unified framework to do so. This paper introduces ValiTex, a new validation framework designed to assist scholars in validly measuring social science constructs based on textual data. ValiTex prescribes researchers to demonstrate three types of validity evidence: substantive evidence (outlining the theoretical underpinning of the measure), structural evidence (examining the properties of the text model and its output), and external evidence (testing for how the measure relates to independent information). In addition to the framework, ValiTex offers valuable practical guidance through a checklist that is adaptable for different use cases. The checklist clearly defines and outlines specific validation steps while also offering a knowledgeable evaluation of the importance of each validation step to establish validity. We demonstrate the utility of the framework by applying it to a use case of detecting sexism from social media data.
翻译:关于如何验证基于计算文本的社会科学构念测量方法的指导目前较为零散。尽管学者们普遍承认验证文本测量指标的重要性,但往往缺乏统一的术语体系和标准化框架。本文提出ValiTex这一新型验证框架,旨在帮助学者基于文本数据有效测量社会科学构念。ValiTex要求研究者提供三类有效性证据:实质性证据(阐述测量的理论基础)、结构性证据(检验文本模型及其输出特性)以及外部性证据(测试测量结果与独立信息间的关联)。除框架本身外,ValiTex还通过可适配不同应用场景的检查清单提供实用指导。该检查清单清晰界定并具体说明了各验证步骤,同时对各步骤对确立有效性所需的重要程度进行了专业评估。我们通过检测社交媒体数据中性别歧视的应用案例,展示了该框架的实用性。