Design-based edge-level causal inference with machine learning assisted covariate adjustment

We study design-based causal inference for edge-level outcomes in directed networks under dyadic interference. In this setting, outcomes are defined on directed edges and depend on the joint treatment assignments of pairs of units, inducing a complex dependence structure that invalidates standard estimation and inference procedures developed for node-level data. We construct Horvitz--Thompson estimators for a general class of edge-level causal effects and establish their asymptotic normality under mild regularity conditions. To enable valid inference, we develop variance estimators that exploit identifiable components of network dependence, yielding substantially less conservative bounds than classical approaches. To improve efficiency, we incorporate auxiliary covariates through a sample splitting and cross-fitting procedure. A key technical challenge is that standard two-fold sample splitting fails in the presence of edge-level outcomes due to the dependence induced by shared units. To address this issue, we introduce a three-fold sample splitting and cross-fitting scheme that restores the conditional independence required for unbiased estimation. Under a stability condition, the resulting covariate-adjusted estimator is asymptotically normal and accommodates both linear adjustment and flexible machine learning methods. We further introduce a calibration step that guarantees no asymptotic efficiency loss relative to the unadjusted estimator. Simulation studies and a real-data application confirm the theoretical results and demonstrate substantial efficiency gains.

翻译：我们研究了在有向网络中，二元干扰下边缘级结果的设计型因果推断。在此设定下，结果定义在有向边上，并依赖于单位对联合处理分配，这引发了复杂的依赖结构，使得为标准节点级数据开发的估计和推断方法失效。我们为一般类别的边缘级因果效应构建了霍维茨-汤普森估计量，并在温和正则条件下建立了其渐近正态性。为了实现有效推断，我们开发了利用网络依赖可识别成分的方差估计量，其得到的边界相比经典方法显著不那么保守。为了提高效率，我们通过样本拆分和交叉拟合程序纳入辅助协变量。一个关键的技术挑战是，由于共享单位引发的依赖，标准的二重样本拆分在存在边缘级结果时失效。为了解决这一问题，我们引入了一种三重样本拆分和交叉拟合方案，该方案恢复了无偏估计所需的条件独立性。在稳定性条件下，得到的协变量调整估计量是渐近正态的，并能同时容纳线性调整和灵活的机器学习方法。我们进一步引入了一个校准步骤，该步骤保证了相对于未调整估计量不会产生渐近效率损失。模拟研究和实际数据应用证实了理论结果，并展示了显著的效率提升。