Metrics to Detect Small-Scale and Large-Scale Citation Orchestration

Citation counts and related metrics have pervasive uses and misuses in academia and research appraisal, serving as scholarly influence and recognition measures. Hence, comprehending the citation patterns exhibited by authors is essential for assessing their research impact and contributions within their respective fields. Although the h-index, introduced by Hirsch in 2005, has emerged as a popular bibliometric indicator, it fails to account for the intricate relationships between authors and their citation patterns. This limitation becomes particularly relevant in cases where citations are strategically employed to boost the perceived influence of certain individuals or groups, a phenomenon that we term "orchestration". Orchestrated citations can introduce biases in citation rankings and therefore necessitate the identification of such patterns. Here, we use Scopus data to investigate orchestration of citations across all scientific disciplines. Orchestration could be small-scale, when the author him/herself and/or a small number of other authors use citations strategically to boost citation metrics like h-index; or large-scale, where extensive collaborations among many co-authors lead to high h-index for many/all of them. We propose three orchestration indicators: extremely low values in the ratio of citations over the square of the h-index (indicative of small-scale orchestration); extremely small number of authors who can explain at least 50% of an author's total citations (indicative of either small-scale or large-scale orchestration); and extremely large number of co-authors with more than 50 co-authored papers (indicative of large-scale orchestration). The distributions, potential thresholds based on 1% (and 5%) percentiles, and insights from these indicators are explored and put into perspective across science.

翻译：引文数量及相关指标在学术界和研究评价中具有广泛的应用与误用，常被用作衡量学术影响力与认可度的标准。因此，理解作者展现的引文模式对于评估其研究影响力及其在各自领域的贡献至关重要。尽管Hirsch于2005年提出的h指数已成为流行的文献计量指标，但它未能充分考虑作者与其引文模式之间复杂的关系。这一局限性在引文被策略性地用于提升特定个人或群体感知影响力的情况下尤为突出，我们将此现象称为“操纵”。操纵性引文可能导致引文排名产生偏差，因此识别此类模式显得尤为必要。本研究利用Scopus数据，探究了跨所有科学学科的引文操纵现象。操纵可分为小规模与大规模两类：小规模操纵指作者自身和/或少数其他作者策略性地使用引文以提升h指数等引文指标；大规模操纵则指众多合著者之间的大范围合作导致多人/所有人获得高h指数。我们提出三项操纵指标：引文数与h指数平方之比极低（暗示小规模操纵）；能够解释作者总引文量至少50%的作者数量极少（暗示小规模或大规模操纵）；以及拥有超过50篇合著论文的合著者数量极大（暗示大规模操纵）。本研究系统探讨了这些指标的分布特征、基于1%（及5%）百分位数的潜在阈值，并从跨学科视角阐释了其深层含义。