Estimating a unit's responses to interventions with an associated dose, the "conditional average dose response" (CADR), is relevant in a variety of domains, from healthcare to business, economics, and beyond. Such a response typically needs to be estimated from observational data, which introduces several challenges. That is why the machine learning (ML) community has proposed several tailored CADR estimators. Yet, the proposal of most of these methods requires strong assumptions on the distribution of data and the assignment of interventions, which go beyond the standard assumptions in causal inference. Whereas previous works have so far focused on smooth shifts in covariate distributions across doses, in this work, we will study estimating CADR from clustered data and where different doses are assigned to different segments of a population. On a novel benchmarking dataset, we show the impacts of clustered data on model performance and propose an estimator, CBRNet, that learns cluster-agnostic and hence dose-agnostic covariate representations through representation balancing for unbiased CADR inference. We run extensive experiments to illustrate the workings of our method and compare it with the state of the art in ML for CADR estimation.
翻译:估计个体对具有关联剂量的干预措施(即“条件平均剂量响应”,CADR)的响应,在从医疗保健到商业、经济学等多个领域都具有重要意义。此类响应通常需要从观测数据中估计,这带来了若干挑战。因此,机器学习(ML)领域已提出了多种专门的CADR估计器。然而,这些方法大多要求对数据分布和干预分配做出强假设,这些假设超出了因果推断中的标准假设。以往研究主要关注协变量分布随剂量变化的平滑偏移,而本工作将研究从聚类数据中估计CADR的问题,其中不同剂量被分配给群体的不同区段。在一个新颖的基准数据集上,我们展示了聚类数据对模型性能的影响,并提出了一种估计器CBRNet,它通过表示平衡学习与聚类无关(因而也与剂量无关)的协变量表示,以实现无偏的CADR推断。我们进行了大量实验以阐明我们方法的运作机制,并将其与CADR估计的当前最先进ML方法进行了比较。