For randomized trials that use text as an outcome, traditional approaches for assessing treatment impact require that each document first be manually coded for constructs of interest by trained human raters. This process, the current standard, is both time-consuming and limiting: even the largest human coding efforts are typically constrained to measure only a small set of dimensions across a subsample of available texts. In this work, we present an inferential framework that can be used to increase the power of an impact assessment, given a fixed human-coding budget, by taking advantage of any ``untapped" observations -- those documents not manually scored due to time or resource constraints -- as a supplementary resource. Our approach, a methodological combination of causal inference, survey sampling methods, and machine learning, has four steps: (1) select and code a sample of documents; (2) build a machine learning model to predict the human-coded outcomes from a set of automatically extracted text features; (3) generate machine-predicted scores for all documents and use these scores to estimate treatment impacts; and (4) adjust the final impact estimates using the residual differences between human-coded and machine-predicted outcomes. As an extension to this approach, we also develop a strategy for identifying an optimal subset of documents to code in Step 1 in order to further enhance precision. Through an extensive simulation study based on data from a recent field trial in education, we show that our proposed approach can be used to reduce the scope of a human-coding effort while maintaining nominal power to detect a significant treatment impact.
翻译:对于使用文本作为结果的随机试验,传统评估处理影响的方法要求首先由受过培训的人工评分员对每个文档进行感兴趣构念的手工编码。这一流程作为当前标准,既耗时又具有局限性:即便是规模最大的人工编码工作,通常也只能对可用文本子样本中的少量维度进行测量。在本研究中,我们提出了一种推断框架,该框架可利用因时间或资源限制而未进行人工评分的"未利用"观测数据作为补充资源,在固定人工编码预算下提升影响评估的统计效力。我们的方法融合了因果推断、抽样调查方法和机器学习,包含四个步骤:(1)选择并编码文档样本;(2)构建机器学习模型,利用自动提取的文本特征预测人工编码结果;(3)为所有文档生成机器预测评分,并用这些评分估计处理效应;(4)利用人工编码与机器预测结果之间的残差调整最终效应估计。作为该方法的扩展,我们还开发了一种策略,用于识别第一步中需编码文档的最优子集,以进一步提高精度。基于近期教育领域现场试验数据开展的广泛模拟研究表明,所提方法可在保持检测显著处理效应的标称统计效力的同时,缩减人工编码的工作范围。