Quantifying the effects of textual interventions in social systems, such as reducing anger in social media posts to see its impact on engagement, is challenging. Real-world interventions are often infeasible, necessitating reliance on observational data. Traditional causal inference methods, typically designed for binary or discrete treatments, are inadequate for handling the complex, high-dimensional textual data. This paper addresses these challenges by proposing CausalDANN, a novel approach to estimate causal effects using text transformations facilitated by large language models (LLMs). Unlike existing methods, our approach accommodates arbitrary textual interventions and leverages text-level classifiers with domain adaptation ability to produce robust effect estimates against domain shifts, even when only the control group is observed. This flexibility in handling various text interventions is a key advancement in causal estimation for textual data, offering opportunities to better understand human behaviors and develop effective interventions within social systems.
翻译:量化社会系统中文本干预的效果(例如,通过降低社交媒体帖子的愤怒情绪来观察其对参与度的影响)具有挑战性。现实世界的干预往往难以实施,因此必须依赖观测数据。传统的因果推断方法通常针对二元或离散处理设计,难以处理复杂的高维文本数据。本文通过提出CausalDANN来解决这些挑战,这是一种利用大语言模型(LLMs)促进的文本变换来估计因果效应的新方法。与现有方法不同,我们的方法能够适应任意的文本干预,并利用具有领域适应能力的文本级分类器,即使在仅观测到对照组的情况下,也能针对领域偏移产生稳健的效应估计。这种处理多种文本干预的灵活性是文本数据因果估计领域的关键进展,为更好地理解人类行为并在社会系统中制定有效干预提供了机会。