The detection of outliers is of critical importance in the assurance of data quality. Outliers may exist in observed data or in data derived from these observed data, such as estimates and forecasts. An outlier may indicate a problem with its data generation process or may simply be a true, but unusual, statement about the world. Without making any distributional assumptions, we proposes the use of loss functions to detect these outliers in panel data. Part I covers nonnegative data. We axiomatically derive an unsigned loss function. We then develop a signed loss function ito account for positive and negative outliers separately. In the case of nominal time we obtain an exact parametrization of the loss function. A time-invariant loss function permits the comparison of data at multiple times on the same basis. We provide several examples, including an example in which the outliers are classified by another variable. Part II covers data of mixed sign. Similar to Part I, we axiomatically develop unsigned and signed loss functions. We search for optimal values of the loss function parameter using graphs.
翻译:异常值检测对于保障数据质量至关重要。异常值可能存在于观测数据中,也可能存在于由这些观测数据衍生的数据(如估计值和预测值)中。异常值可能指示其数据生成过程存在问题,也可能仅是对现实世界真实但罕见的描述。在不做任何分布假设的前提下,本文提出使用损失函数来检测面板数据中的异常值。第一部分涵盖非负数据:我们通过公理化方法推导出无符号损失函数,继而建立有符号损失函数以分别处理正负异常值。在名义时间情形下,我们获得了损失函数的精确参数化表示。时间不变的损失函数允许在相同基准上比较多个时间点的数据。我们提供了若干示例,包括通过另一变量对异常值进行分类的案例。第二部分处理符号混合数据:类似于第一部分,我们通过公理化方法构建无符号与有符号损失函数,并借助图形搜索损失函数参数的最优值。