Dyadic data is often encountered when quantities of interest are associated with the edges of a network. As such it plays an important role in statistics, econometrics and many other data science disciplines. We consider the problem of uniformly estimating a dyadic Lebesgue density function, focusing on nonparametric kernel-based estimators taking the form of dyadic empirical processes. Our main contributions include the minimax-optimal uniform convergence rate of the dyadic kernel density estimator, along with strong approximation results for the associated standardized and Studentized $t$-processes. A consistent variance estimator enables the construction of valid and feasible uniform confidence bands for the unknown density function. We showcase the broad applicability of our results by developing novel counterfactual density estimation and inference methodology for dyadic data, which can be used for causal inference and program evaluation. A crucial feature of dyadic distributions is that they may be "degenerate" at certain points in the support of the data, a property making our analysis somewhat delicate. Nonetheless our methods for uniform inference remain robust to the potential presence of such points. For implementation purposes, we discuss inference procedures based on positive semi-definite covariance estimators, mean squared error optimal bandwidth selectors and robust bias correction techniques. We illustrate the empirical finite-sample performance of our methods both in simulations and with real-world trade data, for which we make comparisons between observed and counterfactual trade distributions in different years. Our technical results concerning strong approximations and maximal inequalities are of potential independent interest.
翻译:二元数据常见于网络边关联的感兴趣量中,因此在统计学、计量经济学及众多数据科学领域具有重要地位。本文研究二元勒贝格密度函数的均匀估计问题,聚焦于具有二元经验过程形式的非参数核估计量。主要贡献包括:二元核密度估计量的极小化最优均匀收敛速度,以及相关标准化和Student化$t$-过程的强逼近结果。通过一致方差估计量,我们可为未知密度函数构造有效且可行的均匀置信带。通过开发二元数据的反事实密度估计与推断方法(可用于因果推断和项目评估),我们展示了结果的广泛适用性。二元分布的一个关键特征是其可能在数据支撑的某些点处呈现"退化"现象,这一性质使得我们的分析具有一定难度。然而,本文的均匀推断方法对此类潜在存在点仍保持稳健性。在实现层面,我们讨论了基于半正定协方差估计量、均方误差最优带宽选择器及稳健偏差校正技术的推断程序。通过模拟实验和实际贸易数据(对比不同年份的观测与反事实贸易分布),我们展示了方法在有限样本下的实证表现。关于强逼近和极大不等式的技术结果可能具有独立的研究价值。