A common occurrence in many disciplines is the need to assign a set of items into categories or classes with known labels. This is often done by one or more expert raters, or sometimes by an automated process. If these assignments, or 'ratings', are difficult to do, a common tactic is to repeat them by different raters, or even by the same rater multiple times on different occasions. We present an R package, rater, available on CRAN, that implements Bayesian versions of several statistical models that allow analysis of repeated categorical rating data. Inference is possible for the true underlying (latent) class of each item, as well as the accuracy of each rater. The models are based on, and include, the Dawid-Skene model, and we implemented them using the Stan probabilistic programming language. We illustrate usage of rater through a few examples. We also discuss in detail the techniques of marginalisation and conditioning, which are necessary for these models but also apply more generally to other models implemented in Stan.
翻译:在许多学科中,常需要将一组项目分配至已知标签的类别或分类中。这项工作通常由一位或多位专家评审员完成,有时也通过自动化流程实现。若这些分配(即“评级”)存在难度,常见的策略是让不同评审员重复进行,甚至由同一评审员在不同场合多次进行。我们介绍了一个已在CRAN上发布的R包——rater,该包实现了多种统计模型的贝叶斯版本,用于分析重复分类评级数据。通过该包,可以对每个项目的真实潜在类别以及每位评审员的准确性进行推断。这些模型基于(并包含)Dawid-Skene模型,我们使用Stan概率编程语言实现。通过若干示例,我们展示了rater包的使用方法。此外,我们详细讨论了边际化和条件化技术——这些技术对于上述模型至关重要,同时也广泛适用于Stan实现的其他模型。