Country comparisons using standardized test scores may in some cases be misleading unless we make sure that the potential sample selection bias created by drop-outs and non-enrollment patterns does not alter the analysis. In this paper, I propose an answer to this issue which consists of identifying the counterfactual distribution of achievement (I mean the distribution of achievement if there was hypothetically no selection) from the observed distribution of achievements. International comparison measures like means, quantiles, and inequality measures have to be computed using that counterfactual distribution which is statistically closer to the observed one for a low proportion of out-of-school children. I identify the quantiles of that latent distribution by readjusting the percentile levels of the observed quantile function of achievement. Because the data on test scores is by nature truncated, I have to rely on auxiliary data to borrow identification power. I finally applied my method to compute selection corrected means using PISA 2018 and PASEC 2019 and I found that ranking/comparisons can change.
翻译:使用标准化测试分数进行国家比较时,若未能确保辍学和非入学模式造成的潜在样本选择偏差不改变分析结果,则这些比较在某些情况下可能具有误导性。本文针对这一问题提出解决方案,即从观测到的成就分布中识别反事实成就分布(即假设不存在选择时成就的理论分布)。国际比较指标(如均值、分位数及不平等度量)需基于该反事实分布计算——该分布在失学儿童比例较低时与观测分布统计上更为接近。通过重新调整观测成就分位数函数的百分位数水平,我识别出该潜在分布的分位数。由于测试分数数据本质上存在截断特征,我必须借助辅助数据获取识别能力。最终,我将该方法应用于PISA 2018和PASEC 2019数据,计算经选择校正后的均值,发现排名/比较结果可能发生改变。