Clustered data are common in biomedical research. Observations in the same cluster are often more similar to each other than to observations from other clusters. The intraclass correlation coefficient (ICC), first introduced by R. A. Fisher, is frequently used to measure this degree of similarity. However, the ICC is sensitive to extreme values and skewed distributions, and depends on the scale of the data. It is also not applicable to ordered categorical data. We define the rank ICC as a natural extension of Fisher's ICC to the rank scale, and describe its corresponding population parameter. The rank ICC is simply interpreted as the rank correlation between a random pair of observations from the same cluster. We also extend the definition when the underlying distribution has more than two hierarchies. We describe estimation and inference procedures, show the asymptotic properties of our estimator, conduct simulations to evaluate its performance, and illustrate our method in three real data examples with skewed data, count data, and three-level data.
翻译:聚类数据在生物医学研究中十分常见。同一聚类内的观测值往往比不同聚类间的观测值更为相似。由R. A. Fisher首次提出的组内相关系数(intraclass correlation coefficient, ICC)常被用于衡量这种相似程度。然而,ICC对极端值和偏态分布较为敏感,且依赖于数据的度量尺度,此外并不适用于有序分类数据。本文将秩ICC定义为Fisher ICC在秩尺度上的自然延伸,并描述了其对应的总体参数。秩ICC可简洁地解释为同一聚类内随机配对观测值之间的秩相关性。当基础分布具有超过两个层级时,我们进一步扩展了该定义。我们提出了估计与推断方法,展示了估计量的渐近性质,通过模拟实验评估其性能,并在三个真实数据实例(包括偏态数据、计数数据及三级数据)中阐述了该方法的应用。