Nearly all statistical analyses that inform policy-making are based on imperfect data. As examples, the data may suffer from measurement errors, missing values, sample selection bias, or record linkage errors. Analysts have to decide how to handle such data imperfections, e.g., analyze only the complete cases or impute values for the missing items via some posited model. Their choices can influence estimates and hence, ultimately, policy decisions. Thus, it is prudent for analysts to evaluate the sensitivity of estimates and policy decisions to the assumptions underlying their choices. To facilitate this goal, we propose that analysts define metrics and visualizations that target the sensitivity of the ultimate decision to the assumptions underlying their approach to handling the data imperfections. Using these visualizations, the analyst can assess their confidence in the policy decision under their chosen analysis. We illustrate metrics and corresponding visualizations with two examples, namely considering possible measurement error in the inputs of predictive models of presidential vote share and imputing missing values when evaluating the percentage of children exposed to high levels of lead.
翻译:几乎所有为政策制定提供依据的统计分析都基于不完善的数据。例如,数据可能存在测量误差、缺失值、样本选择偏差或记录链接错误。分析人员必须决定如何处理此类数据缺陷,例如仅分析完整案例或通过某种假设模型对缺失项进行插补。他们的选择会影响估计结果,并最终影响政策决策。因此,分析人员有必要评估估计结果和政策决策对其选择所依据假设的敏感性。为促进这一目标,我们建议分析人员定义针对最终决策对处理数据缺陷方法所依据假设的敏感性的度量和可视化方案。通过这些可视化工具,分析人员可以评估在其选定分析下对政策决策的置信度。我们通过两个示例说明相关度量及对应的可视化方法:一是考虑总统得票率预测模型输入中可能存在的测量误差,二是在评估暴露于高浓度铅的儿童比例时对缺失值进行插补。