The presence of units with extreme values in the dependent and/or independent variables (i.e., vertical outliers, leveraged data) has the potential to severely bias regression coefficients and/or standard errors. This is common with short panel data because the researcher cannot advocate asymptotic theory. Example include cross-country studies, cell-group analyses, and field or laboratory experimental studies, where the researcher is forced to use few cross-sectional observations repeated over time due to the structure of the data or research design. Available diagnostic tools may fail to properly detect these anomalies, because they are not designed for panel data. In this paper, we formalise statistical measures for panel data models with fixed effects to quantify the degree of leverage and outlyingness of units, and the joint and conditional influences of pairs of units. We first develop a method to visually detect anomalous units in a panel data set, and identify their type. Second, we investigate the effect of these units on LS estimates, and on other units' influence on the estimated parameters. To illustrate and validate the proposed method, we use a synthetic data set contaminated with different types of anomalous units. We also provide an empirical example.
翻译:在因变量和/或自变量中存在极端值单元(即垂直异常点、杠杆数据点)时,回归系数和/或标准误可能产生严重偏差。这一问题在短面板数据中尤为常见,因为研究者无法借助渐近理论进行论证。例如跨国研究、细胞组分析、田野或实验室实验研究等场景中,由于数据结构和研究设计的限制,研究者被迫使用少量重复观测的横截面数据。现有诊断工具因非针对面板数据设计,可能无法有效检测这些异常值。本文针对固定效应面板数据模型,系统构建了量化单元杠杆程度和异常程度的统计指标,以及单元对之间的联合影响与条件影响测度。首先,我们提出可视化检测面板数据集中异常单元并识别其类型的方法;其次,探究这些单元对最小二乘估计的影响,以及其如何影响其他单元对估计参数的估计效果。为验证所提方法的有效性,我们采用包含不同类型异常单元的合成数据集进行验证,并辅以实证案例分析。