Adjusting for latent covariates is crucial for estimating causal effects from observational textual data. Most existing methods only account for confounding covariates that affect both treatment and outcome, potentially leading to biased causal effects. This bias arises from insufficient consideration of non-confounding covariates, which are relevant only to either the treatment or the outcome. In this work, we aim to mitigate the bias by unveiling interactions between different variables to disentangle the non-confounding covariates when estimating causal effects from text. The disentangling process ensures covariates only contribute to their respective objectives, enabling independence between variables. Additionally, we impose a constraint to balance representations from the treatment group and control group to alleviate selection bias. We conduct experiments on two different treatment factors under various scenarios, and the proposed model significantly outperforms recent strong baselines. Furthermore, our thorough analysis on earnings call transcripts demonstrates that our model can effectively disentangle the variables, and further investigations into real-world scenarios provide guidance for investors to make informed decisions.
翻译:调整潜在协变量对于从观测性文本数据中估计因果效应至关重要。现有方法大多仅考虑同时影响处理变量和结果变量的混淆协变量,这可能导致有偏的因果效应。这种偏差源于对非混淆协变量的不充分考量,这些变量仅与处理变量或结果变量之一相关。在本研究中,我们旨在通过揭示不同变量间的交互作用,在从文本估计因果效应时解耦非混淆协变量,从而减轻偏差。解耦过程确保协变量仅贡献于各自的目标,实现变量间的独立性。此外,我们施加约束以平衡处理组与控制组表征,缓解选择偏差。我们在不同场景下针对两种不同的处理因子进行实验,所提模型显著优于近期强基线模型。进一步,对收益电话会议记录的深入分析表明,我们的模型能有效解耦变量,而面向真实世界场景的探索为投资者制定明智决策提供了指导。