This paper explores the use of unstructured, multimodal data, namely text and images, in causal inference and treatment effect estimation. We propose a neural network architecture that is adapted to the double machine learning (DML) framework, specifically the partially linear model. An additional contribution of our paper is a new method to generate a semi-synthetic dataset which can be used to evaluate the performance of causal effect estimation in the presence of text and images as confounders. The proposed methods and architectures are evaluated on the semi-synthetic dataset and compared to standard approaches, highlighting the potential benefit of using text and images directly in causal studies. Our findings have implications for researchers and practitioners in economics, marketing, finance, medicine and data science in general who are interested in estimating causal quantities using non-traditional data.
翻译:本文探讨了非结构化多模态数据(即文本和图像)在因果推断及处理效应估计中的应用。我们提出了一种适配于双重机器学习(DML)框架的神经网络架构,特别针对部分线性模型。本文的另一贡献是提出了一种生成半合成数据集的新方法,该数据集可用于评估存在文本和图像作为混杂因素时因果效应估计的性能。我们通过半合成数据集对所提出的方法和架构进行了评估,并与标准方法进行了对比,突显了在因果研究中直接使用文本和图像的潜在优势。研究结果为经济学、市场营销、金融、医学及一般数据科学领域中,利用非传统数据估计因果量的研究人员和实践者提供了参考。