OpenAlex is a promising open source of scholarly metadata, and competitor to established proprietary sources, such as the Web of Science and Scopus. As OpenAlex provides its data freely and openly, it permits researchers to perform bibliometric studies that can be reproduced in the community without licensing barriers. However, as OpenAlex is a rapidly evolving source and the data contained within is expanding and also quickly changing, the question naturally arises as to the trustworthiness of its data. In this report, we will study the reference coverage and selected metadata within each database and compare them with each other to help address this open question in bibliometrics. In our large-scale study, we demonstrate that, when restricted to a cleaned dataset of 16.8 million recent publications shared by all three databases, OpenAlex has average source reference numbers and internal coverage rates comparable to both Web of Science and Scopus. We further analyse the metadata in OpenAlex, the Web of Science and Scopus by journal, finding a similarity in the distribution of source reference counts in the Web of Science and Scopus as compared to OpenAlex. We also demonstrate that the comparison of other core metadata covered by OpenAlex shows mixed results when broken down by journal, capturing more ORCID identifiers, fewer abstracts and a similar number of Open Access status indicators per article when compared to both the Web of Science and Scopus.
翻译:OpenAlex是一个前景广阔的学术元数据开源平台,也是Web of Science和Scopus等成熟商业数据库的竞争者。由于OpenAlex免费开放其数据,研究人员得以在无许可壁垒的情况下开展可被学界复现的文献计量研究。然而,OpenAlex作为快速演进的数据库,其数据规模持续扩张且更新频繁,其数据可信度自然受到关注。本报告通过对比分析各数据库的参考文献覆盖度及选定元数据,以回应文献计量学中这一开放性问题。在大规模研究中我们发现:当限定于三个数据库共有的1680万篇近期文献的清洗数据集时,OpenAlex的平均参考文献数量与内部覆盖度均与Web of Science和Scopus相当。我们进一步按期刊对OpenAlex、Web of Science和Scopus的元数据进行剖析,发现Web of Science与Scopus在参考文献数量分布上具有相似性,而与OpenAlex存在差异。同时,按期刊细分的对比数据显示:相较于Web of Science和Scopus,OpenAlex在其他核心元数据覆盖方面呈现差异化表现——每篇文章收录更多ORCID标识符、更少摘要,而开放获取状态指标的数量则基本持平。