To measure the degree of agreement between two observers that independently classify $n$ subjects within $K$ categories, it is common to use different kappa type coefficients, the most common of which is the $κ_C$ coefficient (Cohen's kappa). As $κ_C$ has some weaknesses -such as its poor performance with highly unbalanced marginal distributions-, the $Δ$ coefficient is sometimes used, based on the $delta$ response model. This model allows us to obtain other parameters like: (a) the $α_i$ contribution of each $i$ category to the value of the global agreement $Δ=\sum α_i$; and (b) the consistency $\mathcal{S}_i$ in the category $i$ (degree of agreement in the category $i$), a more appropriate parameter than the kappa value obtained by collapsing the data into the category $i$. It has recently been shown that the classic estimator $\hatκ_C$ underestimates $κ_C$, having obtained a new estimator $\hatκ_{CU}$ which is less biased. This article demonstrates that something similar happens to the known estimators $\hatΔ$, $\hatα_i$, and $\hat{\mathcal{S}}_i$ of $Δ$, $α_i$ and $\mathcal{S}_i$ (respectively), proposes new and less biased estimators $\hatΔ_U$, $\hatα_{iU}$, and $\hat{\mathcal{S}}_{iU}$, determines their variances, analyses the behaviour of all estimators, and concludes that the new estimators should be used when $n$ or $K$ are small (at least when $n\leq 50$ or $K\leq 3$). Additionally, the case where one of the raters is a gold standard is contemplated, in which situation two new parameters arise: the $conformity$ (the rater's capability to recognize a subject in the category $i$) and the $predictivity$ (the reliability of a response $i$ by the rater).
翻译:暂无翻译