This study introduces an approach to estimate the uncertainty in bibliometric indicator values that is caused by data errors. This approach utilizes Bayesian regression models, estimated from empirical data samples, which are used to predict error-free data. Through direct Monte Carlo simulation -- drawing predicted data from the estimated regression models a large number of times for the same input data -- probability distributions for indicator values can be obtained, which provide the information on their uncertainty due to data errors. It is demonstrated how uncertainty in base quantities, such as the number of publications of a unit of certain document types and the number of citations of a publication, can be propagated along a measurement model into final indicator values. This method can be used to estimate the uncertainty of indicator values due to sources of errors with known error distributions. The approach is demonstrated with simple synthetic examples for instructive purposes and real bibliometric research evaluation data to show its possible application in practice.
翻译:本研究提出了一种评估由数据误差引起的文献计量指标值不确定性的方法。该方法利用从实证数据样本中估计的贝叶斯回归模型来预测无误差数据。通过直接蒙特卡洛模拟——从估计的回归模型中对相同输入数据进行大量次数的预测数据抽取——可获得指标值的概率分布,从而提供关于其由数据误差导致的不确定性信息。本文展示了如何将基础量(如某单位特定文献类型的出版物数量及某出版物的被引次数)的不确定性沿测量模型传播至最终指标值。此方法可用于估计由已知误差分布的误差源引起的指标值不确定性。本文通过简单合成示例进行教学说明,并采用真实文献计量研究评估数据展示其实际应用潜力。