Understanding the causal effects of text on downstream outcomes is a central task in many applications. Estimating such effects requires researchers to run controlled experiments that systematically vary textual features. While large language models (LLMs) hold promise for generating text, producing and evaluating controlled variation requires more careful attention. In this paper, we present an end-to-end pipeline for the generation and causal estimation of latent textual interventions. Our work first performs hypothesis generation and steering via sparse autoencoders (SAEs), followed by robust causal estimation. Our pipeline addresses both computational and statistical challenges in text-as-treatment experiments. We demonstrate that naive estimation of causal effects suffers from significant bias as text inherently conflates treatment and covariate information. We describe the estimation bias induced in this setting and propose a solution based on covariate residualization. Our empirical results show that our pipeline effectively induces variation in target features and mitigates estimation error, providing a robust foundation for causal effect estimation in text-as-treatment settings.
翻译:理解文本对下游结果的因果效应是许多应用中的核心任务。估计此类效应需要研究者进行系统改变文本特征的控制实验。尽管大型语言模型(LLMs)在文本生成方面具有潜力,但生成和评估受控变异需要更细致的关注。本文提出了一种用于潜在文本干预生成与因果估计的端到端流程。我们的工作首先通过稀疏自编码器(SAEs)进行假设生成与引导,随后进行稳健的因果估计。该流程解决了文本作为干预实验中的计算与统计挑战。我们证明,对因果效应的朴素估计会因文本固有地混淆干预与协变量信息而产生显著偏差。我们描述了此设置下引发的估计偏差,并提出了一种基于协变量残差化的解决方案。实证结果表明,我们的流程能有效诱导目标特征的变异并减少估计误差,为文本作为干预场景下的因果效应估计提供了稳健的基础。