This study introduces an approach to estimate the uncertainty in bibliometric indicator values that is caused by data errors. This approach utilizes Bayesian regression models, estimated from empirical data samples, which are used to predict error-free data. Through direct Monte Carlo simulation -- drawing predicted data from the estimated regression models a large number of times for the same input data -- probability distributions for indicator values can be obtained, which provide the information on their uncertainty due to data errors. It is demonstrated how uncertainty in base quantities, such as the number of publications of a unit of certain document types and the number of citations of a publication, can be propagated along a measurement model into final indicator values. This method can be used to estimate the uncertainty of indicator values due to sources of errors with known error distributions. The approach is demonstrated with simple synthetic examples for instructive purposes and real bibliometric research evaluation data to show its possible application in practice.
翻译:本研究提出了一种方法,用于估算由数据错误引起的文献计量指标值的不确定性。该方法利用贝叶斯回归模型,这些模型基于经验数据样本进行估计,并用于预测无错误数据。通过直接蒙特卡洛模拟——从估计的回归模型中对同一输入数据进行大量重复抽样,以获取预测数据——可以得到指标值的概率分布,从而提供关于由数据错误引起的不确定性信息。本文展示了基量(如某单位特定文献类型的出版物数量及某出版物的被引次数)的不确定性如何通过测量模型传播到最终指标值。该方法可用于估算由已知误差分布的错误源引起的指标值不确定性。本研究通过简单的合成示例进行说明,以达教学目的,并利用真实的文献计量研究评估数据展示其实际应用的可能性。